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ABSTRACT 

This thesis systematically and comprehensively analyzes 
available personnel data to determine if a significant 
relationship exists between measures of intelligence and 
academic performance, and career promotion rate for 
Noncommissioned Officers, Forty thousand Noncommissioned 

Officer (NCO) records were analyzed to determine this, using 
three approaches. 

The first approach was a sequential procedure which 
progressed from analysis of individual variables through 
multivariate regression models. The second approach focused 
on analysis of NCO's who scored in the top three percent of 
promotion rate. The third approach used more advanced 

statistical techniques, including the use of principal 
components and factor analysis, to better identify the most 
influential explanatory variables. 

During the analysis, eight measures of intelligence and 
academic ability were used as explanatory variables. Four 
control variables were included in the analysis to 
discriminate between subcategories of NCO's. They were: 

sex, career field, race, and paygrade. 

Throughout the analysis consideration of Army promotion 
and accession policy was included. Knowledge of these 

policies resulted in elimination of some special groups which 
had received promotions under significantly different 
conditions than the rest of the sample. An example of this 
was Reserve and National Guard members called to active duty. 

This study found that there was significant statistical 
evidence to show that a high level of Armed Forces 

Qualification Test (AFQT) score and prior service academic 
accomplishment will correspond to a higher promotion rate. 
Also, in-service measures of NCO education and performance 
testing were good indicators of promotion rate. 

However, there was significant variance associated with 
the explanatory relationship. As a result, a useful 

predictive model could not be designed using regression 
methods. Although the model could predict promotion averages 
for major population subcategories, it was unreliable when 
used solely with the AFQT variable. 

The findings of this study suggest two policy 

recommendations. The first recommendation was a confirmation 
of the constraints placed on AFQT category and high school 
diploma status by the 1984 Defense Authorizations Act. The 
second recommendation was to require promotion boards to 
consider NCO schooling level and performance test scores in 
their procedings, but to avoid directly tying either score to 
promotion, in terms of a minimum quota or scaled promotion 
point scale. 

Finally, a suggestion was given for further research to 
investigate the underlying reasons for different attrition 
patterns observed among racial and ethnic groups. 
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INTRODUCTION 



I . 



A, BACKGROUND 

In almost any organization, one hopes that individuals at 
high levels of authority are gifted with higher than average 
intelligence. Correspondingly, one would think that, given 
equal work effort, a more intelligent person will advance 
more rapidly than his contemporaries in an organization. 

It is not difficult, however, to find examples which 
contradict our perceptions of the role of intelligence in 
career advancement. In almost any field one can remember an 
individual who was not the most intellectually gifted, but 
through hard work and persistence, or other less quantifiable 
traits, advanced equally or better than persons of higher 
measured mental ability. There is ample room for other 
influences to overwhelm the value of a person's intelligence 
in the eyes of a superior. An unattractive personality, an 
inability to apply that intelligence to the tasks at hand, 
and a myriad of other flaws can discredit the merit of raw 
intelligence . 

The degree at which intelligence impacts on advancement 
lies in the area of complex interaction between individuals 
and organizations . It carries with it much of the 
uncertainty of quantification of human performance. 

Despite ample room for exceptions, the concept of a 
general reward for being more intelligent still seems 
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reasonable. It may be, however^ that to clearly see its 
manifestation requires looking at a large number of people 
who have been affected by as similar a set of opportunities 
for advancement as possible. It is the task of this thesis 
to investigate this relationship within a fairly restricted/ 
but numerically large population. The population is one 
which has had fundamental raw statistics uniformly obtained/ 
and where policies to promote personnel are unambiguous and 
well documented. 

B. PURPOSE 

The purpose of this thesis is to answer a central 
question: Does a significant relationship exist between 
measures of intelligence and academic ability/ and an 
individual's promotion rate as a Noncommissioned Officer? 
Put more simply/ does being smarter/ as measured by initial 
test scores/ or being better schooled/ indicate that a person 
will perform better and/ hence/ advance more quickly than his 
peers? 

The answer to this question has important implications 
for Army policies of recruitment/ retention/ and promotion. 
It is also a matter of general interest to social scientists. 

C. ORGANIZATION 

This thesis is organized fundamentally as a data analysis 
investigation. Chapters I and II provide preliminary 
information on the nature of the study variables/ and briefly 
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review some related articles which have addressed this topic. 
The remaining chapters discuss the analysis of approximately 
forty- thousand Noncommissioned Officer (NCO) records using 
three related approaches. The first approach is a fairly 
standard procedure of experimental data analysis. This 
procedure begins with analysis of fundamental attributes of 
individual variables, then advances through successive 
increases in dimensionality and complexity. The second 
approach views a subset of the population which dis tinguishes 
itself by being in the top three percent of the NCO promotion 
rates. Comparison of these top performers to the remainder 
of the population identifies attributes which are found to be 
significantly different, and hence, are possibly an 
associated cause for rapid advancement. In the third 
approach, the statistical methods of principal components and 
factor analysis are used to provide an alternative method of 
critical variable selection, as well as to lend credibility 
to the results of the other two approaches. 

D. PRELIMINARY INFORMATION 

This section contains an initial discussion about the 
nature of the data, a general overview of the Army NCO 
promotion system, and a synopsis of the analytical tools used 
in this thesis. As previously mentioned, there is a degree 
of looseness in the effectiveness of measurement for 
intelligence and academic data, and also some confounding 
phenomena in Army promotion policy. Early recognition of 
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these problems should set the degree of caution which is 
needed in reviewing the subsequent chapters of analysis. The 
section on analytical tools is intended to inform the reader 
of the conditions under which the data analysis was 
conducted/ and the hardware and software used. 

1 . Intelligence Test Scores 
a. General 

The data for intelligence test scores falls into 
the category sometimes referred to as Defined Measurement. A 
Defined Measurement is one where the property being 
considered cannot be measured directly .[ Ref . 1 :p. 6] As a 
result/ a related measure is substituted for measurement of 
the actual property. In this case/ the property is 
intelligence/ and the presumed related measurements are test 
scores from a particular battery of tests. 

The efficacy of intelligence tests as a representative 
measure for intellectual ability is itself an issue 
surrounded by controversy. This controversy has been the 
topic of entire books and studies. The testing done by the 
Army is the Armed Forces Vocational Aptitude Battery/ or 
ASVAB. Although not designed specifically as an intelligence 
test/ the ASVAB does predict general trainabili ty . 
Additional research has shown that the mathematical and 
verbal portions of the ASVAB have a high correlation to the 
ACT/ PSAT/ and SAT college entrance examinations . C Ref . 21 
The ASVAB has been studied/ improved/ and used for over forty 
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years. A recent article by Jenson CRef 3:p. 35], in 
Measurement and Evaluation in Counseling and Development , 
states : 

"To the degree that success in various occupations and 
training programs requires different levels of general 
ability (often called intelligence or IQ), an ASVAB 
composite (it hardly matters which one) will be as 
validly predictive as any test now on the market. . . It 
seems that the new ASVAB-14 is near the limit of 
refinement, psychometr ically . " 

Generally then, the ASVAB is a well documented and 
established aptitude test. Although the military does not 
specifically attempt to determine the intelligence of its 
potential candidates, academic portions of the ASVAB test 
have shown themselves to be reasonably defined measurements 
of intelligence. 

b. Specific Tests. 

The ASVAB consists of a battery of ten subtests. 
Composites of the subtests of the ASVAB are used to determine 
the overall acceptability of an individual requesting 
enlistment, and for which field he or she would best be 
suited. From the entire battery of tests, two derived scores 
of intelligence are taken as aggregate measures of 
intelligence. The first is the GT, or general intelligence 
score. This score is the aggregation of three submodules, 
the word knowledge, paragraph comprehension, and arithmetic 
reasoning. The second derived measure of intelligence is the 
Armed Forces Qualification Test Score, or AFQT. This score 
considers four submodules, word knowledge, paragraph 
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comprehension/ arithmetic reasoning and numerical 
operations .[ Ref . 10:sec 1-0/ p* 13 An AFQT score is 
reported as a percentile score representing the examinee's 
relative standing in reference to a specific population. 

There has recently been some additional manipulation of 
the AFQT score. In October of 1984/ the reference population 
for assignment of an individual's AFQT percentile was shifted 
from a base reference population of 1944 to that of 1980. A 
base reference population is a set of values designed to 
represent how the raw AFQT scores of the entire American 
youth population would be distributed. This set of values 
was originally designed in 1944/ and had not been updated 
until 1980. This thesis utilized the 1980 base AFQT 
percentiles. A transformation of test percentiles for 
soldiers who enlisted prior to 1980 was effected by the 
Defense Manpower Data Center (DMDC)/ and all subsequent 
Department of the Army records have been computed based on 
the 1980 reference. A listing for AFQT percentile 
transformations can be found in APPENDIX A. 

GT scores/ which are expressed as the sum of the raw 
test scores/ have not been manipulated. However/ unlike the 
the case with AFQT score/ soldiers have been allowed to 
retake their tests to increase their original GT scores. 
Retesting was introduced in 1982 when a minimum GT score of 
120 was enforced on eligibility for promotion to NCO rank. 
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2 . 



Academic Scores 



a. General 

The data used for academic ability is also a 
defined measurement, similar to the measures for 
intelligence. Specifically, the property of academic ability 
is being represented by a simple assignment of the number of 
years This value is independent of the quality of 

education, and the grades that any given individual may have 
received. This study assumes that continued attendance and 
progression through the educational system is inherently 
indicative of academic ability. For example, a high school 
graduate has more academic ability than an individual with an 
eighth grade education. The informational value of academic 
scores is thus, not as useful as desired. It is treated in 
analysis as only an ordinal scaled variable. 

b. Specific 

Three academic scores are used in the study: 
present education level, education level upon entry into 
Army, and military education since entry. Because advanced 
professional schooling is made available only to those 
individuals who have superior service records, the military 
education score carries with it some additional information 
relative to the performance of the NCO. 

3 . Promotion Scores 

Promotion within the Army is a closely supervised and 
somewhat complicated procedure. It is the product of a 
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considerable number of policies which are not uniformly 
applied across the population. Instead, they are applied 

within rank structure, within career field, or even as a 
function of years of education. Thus, although the 

computation of an individual's promotion rate is an easy 
task, that value may have been influenced by several policies 
that were peculiar to the individual, 
a. General 

Promotion of NCO's is governed by Army Regulatic 
AR 600-200. This regulation establishes requirements for 
eligibility, and outlines the process of selection. The 
system views the individual's performance as a whole. This 
includes a composite score based on performance scores, 
commander's ratings, service awards, and review by a board of 
senior NCO's. This composite point value is used as a 

threshold value for the Department of the Army to use when 
promoting individuals to the next higher paygrade, as slots 
become available. The slots are accounted for by career 
management field, and as such, the minimum threshold for a 
combat soldier to be promoted may be different than that of a 
support soldier. A general observation is that career fields 
with more technical orientation have higher promotion point 
thresholds, and subsequently, longer times to advancement 
than those in the larger and less technically oriented career 
fields . 

AR 600-200 also sets minimum times of service and grade 
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which an individual must have served to be considered for 



promotion. Unless superceded by a special policy, the 
shortest period for promotion to E-5 is two years, and is 
four years to E-6. This rate includes waivers for both time 
in service and time in grade. Promotion to E-6 in four years 
requires that the individual be advanced to E-5 in two years, 
b. Specific 

Because of the lack of uniformity of promotion 
within the army population, in this thesis we have taken 
considerable care to identify and address discontinuities 
which would confound promotion based on merit. This includes 
the elimination of some data, and the computation of three 
different promotion rate scores. The governing principle for 
manipulation or restriction of data was to produce a sample 
population in which each individual started from the same 
point in the rank structure, and had equal opportunity for 
advancement by merit. Chapter III, Overview of the Data, 

discusses in detail the identified problems and what 
corrective action was taken. 

4 . Analytical Tools Used 

This section briefly identifies the hardware and 
software used in analysis, 
a. Hardware 

Computational resources used for analysis 
included an IBM 3033 System 370 mainframe computer running 
MVS batch system. Additionally, analysis was done for small 
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data sets using a standard IBM microcomputer, 
b. Software 

Two software packages were used for the majority 
of the data analysis, SAS Version 5 was used predominantly 
for analysis resulting in tabular output^ such as principal 
components and factor analysis • C Ref . 4^5] Graf stat ^ an 
unreleased IBM mainframe data analysis and plotting program^ 
was utilized for analysis requiring graphical output and for 
confirmation of SAS tabular resul ts . C Ref . 6,7J 

E. SUMMARY 

The objective of this introduction has been to adequately 
frame the scope of the topic, and to present sufficient 
background to the reader so that he or she is alerted to some 
of the difficulties inherent in a topic of this nature. 
Also, this will establish a reference for some of the tools 
used to conduct the analysis. 

The length of this section is indicative of the degree of 
preparation required to analyze a relationship which has 
significant complications in both dependent and independent 
variables. Although the list of assumptions and the 
stripping of aberrant data makes one cautious about the 
reality of such a study, each event should be considered on 
its ability to uncover the answer to the central question of 
this thesis. The central question again is, whether or not a 
significant relationship exists between measures of 
intelligence and academic ability, and an individual's 
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promotion rate as a Noncommissioned Officer. It is important 
to learn whether measures of intelligence and academic 
ability are important indicators of promotion in the army, 
and if so, how strong that relationship is. If sufficiently 
reliable and believable relationships can be determined, then 
policies could be designed to better identify and develop 
capable individuals for positions of leadership. 

The analysis of this thesis reduced the effects of 
confounding policies, such as discriminatory promotion and 
accession programs. It also used a sufficiently large sample 
size, which allowed the averages to outweigh the exceptions. 
It drew on data from standard personnel records, and made the 
most effective use of that information. 
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A REVIEW OF PREVIOUS STUDIES 



II . 



The topic of relating intelligence to some aspect of 
performance is an extensive and rich area of study. It is a 
particular topic of interest to social scientists and 
military manpower specialists. As a demonstration of the 
quantity of work done in this area^ a simple cross- 
referencing of the words intelligence test and performance 
produced a list of 237 citations from the Lockheed's DIALOG 
online information files. Restriction of available 
references to those utilizing military intelligence test 
scores and statistical analysis of those tests relative to 
some performance measure still results in a large number of 
citations. Within this restriction there is a variety of 
study methodologies. The source of a study can originate 
from an in-house military analysis^ a contracted study done 
by a commercial analytical institute^ or an academic 
institution making use of military data as its media for 
analysis . 

The nature of the data is also varied. Several studies 
readministered the ASVAB tests to a selected test population, 
other studies used IQ and other intelligence measures in 
addition to the ASVAB. The performance side of the 
relationship had an extensive number of dependent variables. 
Examples of performance measures were: results of written 
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exams, military skills test results, minority advancement, 
and comparison to collegiate ACT, PSAT, and SAT tests. 

This chapter will review four of the most closely 
related studies, concentrating for each one on: 

1. The objective of the study. 

2. The methodology used in analysis. 

3. The conclusion reached. 

The first analysis is from Are Smart Tankers Better? 
AFQT and Military Productivity . C Ref . 8] This study is 

essentially an in-house military analysis, the authors being 
Army officers assigned to the Office of Economic and Manpower 
Analysis, at West Point, New York. As described in the 
title, the paper presents the results of an investigation in 
which the crews of tanks were scored on their ability to 
destroy targets on live fire ranges. The AFQT score of the 
gunner and tank commander was one of several explanatory 
variables, having the tank scores as the dependent variable. 
The analysis methodology used a log-log production model with 
ordinary least squares regression. 

The result of their analysis is best summarized in this 
paragraph from the study: 

"That there exists a positive, statistically 
significant relationship between AFQT and performance, is 
a powerful result. The coefficients on the model means 
that if we move, for example, from the AFQT score for an 
average Category IV TC to the AFQT score for an average 
Category IIIA TC , (a 200% increase), we will increase the 
performance on Table 8 (the tank scoring exercise) by 
approximately 20.3%. " 
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In this study then/ AFQT was found, by means of least squares 
regression, to have a definitive relationship to a well- 
defined skill measure, the conduct of tank firing. 

The second study is an analysis done at the University of 
Iowa by the Cada Research Group titled: On Predicting 

Success in Training for Males and Females; Marine Corps 

Clerical Specialties and ASVAB Forms 6 and 7 . [Ref 9] This 
report uses the ASVAB score as an explanatory variable for 
success of recruits in training. The methodology used is 
primarily regression; however, the scope of the regression 
concentrates on identifying differences between male and 
female performance. The implicit result in the study's 
discussion of the sex score differences is that the 
regressions performed for each category was of useful 
predictive value. An interesting note about this study was 
that the inclusion of high school completion reduces the 
difference between the male and female regression 
coefficients . 

The third study is a section of articles used in the 
Report to the House and Senate Committees on Armed Services, 
Defense Manpower Quality, Volume II, Army Submission . 

[Ref. 10] The section of interest to this thesis was a study 
done by the U. S. Army Training and Doctrine Command (TRADOC) 
Systems Analysis Activity (TRASANA). The study uses AFQT, as 
well as education level, sex, paygrade, time in service, time 
in Military Occupational Specialty (MOS), and a dummy 
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variable reflecting General Equivalency Diploma (GED) 
completion as explanatory variables. GED is a rating given 
to individuals who did not graduate from high school, but who 
have taken examinations to be rated as equivalent to a high 
school graduate. A battery of tests given under controlled 
conditions resulted in a net score which was made the 
dependent variable. The battery of tests was designed so as 
to represent how proficient a soldier was in his specific 
career field. The test included a written, as well as hands- 
on proficiency test. 

The analysis method used was linear regression, with the 
inclusion of a Durbin Instrument as a correction tool for 
AFQT. The results are again best summarized from the report: 

"The most important result is that AFQT Category I-IIIA 
soldiers performed approximately 10% better overall than 
IIIB soldiers. . . Furthermore, AFQT was a much more 
important influence on performance in virtually all 
instances than either education or experience, whether 
measured in terms of time in service, MOS, or unit. 
Thus, these results strongly support the validity of AFQT 
as a predictor of performance in these military 
occupational specialties . " 

This report then, is very similar in conclusion to the 
tank gunnery report, in which AFQT was shown through 
regression to have a significant and measurable effect on 
soldier performance in skill related tasks. 

The last study reviewed is also from the collection found 
in the Defense Manpower Study . [Ref. 11] The topic for this 
study was the estimation of promotion rate. It is presently 
the most similar study to the central theme of this thesis. 
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Using AFQT as one of the independent variables, a duration 
model is applied to estimate the expected speed of promotion. 
This model was applied within two categories, the paygrade 
and the career field of the NCOs . This promotion estimation 
study approaches the aggregation of data in a different 
manner as well. Specifically, by evaluating the possibility 
of promotion for each individual over a series of years, the 
dimension of time was entered into analysis. A significant 
advantage of including the time dimension was that changes in 
the categorical levels of the population could be accounted 
for, such as race or sex. 

The methodology used in the promotion estimation study is 
considerably more complex than in the previous studies. 
Rather than using standard regression models, the study uses 
the Generalized Linear Model form. Specifically, the form of 
the predictive model is a log likelihood function using the 
Weibull shape parameter. The explanatory variables include 
education, AFQT, marital status, race, number of dependants, 
time in service, sex, and high school completion status. By 
using the Weibull model, the application of explanatory 
variables which are not continuous, such as sex, high school 
completion status, and marital status is more proper. 
Additionally, there are no requirements for the normality 
assumptions for the residuals, and therefore, less 
subjectivity to the appropriateness of the model with respect 
to the independent variables. This method, however, does not 
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consider any in-service information and was calculated only 



for very specific CMF and Paygrade combinations. The results 
are summarized as follows: 

"A review of these promotion results reveals two 
trends. First, even after controlling for high school 
diploma status, AFQT Category I-IIIA soldiers are 

promoted approximately 10% more rapidly than IIIB 
soldiers. Second, high school completion is less 

important than AFQT score in determining promotion rates. 
The remarkable aspect of this last result is that 
educational attainment is an explicit part of the Army's 
promotion point system, while AFQT scores are not. These 
trends are true for both promotion to E-5 and promotion 
to E-6. ” 
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ive results have generally been the result, 
er why another study should be undertaken, 
is is in response to a request by the Office 
Chief of Staff for Personnel (ODCSPER) for 
in the relationship of AFQT to success in 



the Army. Secondly, this thesis will be different in its 
approach and analytical procedures. Following is a list of 
the unique characteristics of this thesis: 

1. The perspective of this thesis is that the results will 
be used as a management tool, or as an explanatory 
method for active duty Army personnel. In that light, 
the study utilizes information collected from the 
individual's in-service record, such as his Skill 
Qualification Scores, and his NCO Schooling levels. 
Similar to accession related studies, this analysis 
includes intelligence, academic, and categorical 
information as potential explanatory variables. 
However, the intent is not to justify accession of high 
quality soldiers, but to investigate the trends of 
promotion for active duty personnel as a function of 
available personnel data. 
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2. This study conducts significant investigation 

into the data to identify and correct anomalies which 
would confound the relationship in question. 

3. Statistical analysis is done from the bottom up, 

rather than by direct movement into regression models. 
This approach finds that strict parametric models are 
subject to error due to the inability of some data 
variables to meet distributional assumptions necessary 
for parametric analysis. The study then moves to 

nonparame tr ic means to approach the issue. 

4. For regression models, given the cautions on their use, 
an additional sample population is tested using the 
model. Thus, the results from the initial model can be 
considered to have more believability and fidelity than 
a model based on analysis of a single population 
sample . 

5. The use of a large data set.^ 

6. Several explanatory variables have been made 
available from the DMDC data base which have not been 
used in previous studies. They include the initial 
education at time of entry, NCO education level, and a 
race variable with six categories. 

7. The choice of promotion as the dependent variable 
rather than a set of performance tests. Although prone 
to more uncertainty than results of performance tests, 
promotion is in many ways an ultimate performance 
measure. The service, like any other organization, 
recognizes superior performance by promoting and 
advancing individuals to higher positions of authority. 
As such, promotion rate, despite its problems, has a 
strength of recognition well beyond that of technical 
performance . * 

8. This study uses graphical methods for depiction of many 
of the methods of analysis. 



■^Study number four from Defense Manower Study uses both 
large data sets and promotion as an independent variable. 
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Ill . 



OVERVIEW OF THE DATA 



A. INTRODUCTION 

A critical aspect of this thesis was the selection and 
screening of data. Two general guidelines were applied in 
creating the data set. First, the data set had to 
demonstrate a level of homogeneity in that the NCO's 
considered would all have served under similar enlistment and 
advancement policies. Secondly, the selection of individual 
records needed to be random and without unintentional bias to 
meet the requirements for a representative sample set. 
Section III C. describes in detail the measures taken to 
insure that the above two attributes were established in the 
study data set. 

Recoding of data values into numerical equivalents was 
required for several personnel record fields. As an example, 
the level of Military Schooling, which is the NCO's in- 
service schooling level, was recorded as mixed alpha-numeric 
characters. Transformation involved rank ordering the 
available levels of schooling in ascending hierarchical order 
and substituting a numeric value for the alpha-numeric value. 
Chapter IV discusses in detail the background of each 
variable. Finally, as a check on the effects of manipulating 
and restricting the sample data set, section III D. provided 
a comparison of statistics for the entire U.S. Army NCO 
database, versus the sample data set used in this thesis. 
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B. 



DESCRIPTION OF THE VARIABLES 



The data variables used in this study fall into three 
categories: control variables, intelligence variables, and 
promotion variables. The first two categories, control and 
intelligence, were used as explanatory variables, while the 
promotion variables were used as the dependent variables. A 
brief description of each variable is tabulated in Table I . 





TABLE I 


Summary of Variables in 


Sample 




Variable 


Cateaorv 


Meanina 


Value 


Scale 


Dependent 








PRATE 


Promotion 


Raw Promotion Rate: 
number of promotions 










per month to most 


041-.21 Ratio 






recent promotion 






RATE 


Promotion 


Promotion rate difference 
from average for that 
paygrade (normalized) 


2.2-9. 


4 Ratio 


PRA 


Promotion 


Promotion rate difference 
from average for that 
paygrade and CMF 
( normalized ) 


3.4-8. 


0 Ratio 


Explanatory 








SEX 


Control 


Male/Female 


0/1 


Nominal 


CMF 


Control 


Career Management Field 


11-99 


Nominal 


RACETH 


Control 


Race/Ethnic group 


1-5 


Nominal 


PAYGD 


Control 


Paygrade 


5-7 


Ordinal 


GTSCR 


Intell 


General Intelligence 
Score 


0-160 


Ordinal 


AFQTP 


Intell 


Armed Forces 

Qualification Test Score 
Percentile 


1-100 


Ordinal 


OAFQTP 


Intell 


Same as AFQTP, referenced 
on 1980 population 


1-100 


Ordinal 


EIMCAT 


Intell 


Mental Category; based 
on OAFQTP 


1-8 


Ordinal 


HIYRED 


Intell 


Highest Year of Education 
upon entry into Army 


1-12 


Ordinal 


EDLVL 


Intell 


Present Education Level 


1-12 


Ordinal 


NCOE 


Intell 


Military Education Level 
Attained 


0-13 


Ordinal 


PQSCR 


Intell 


Army Proficiency Test 


0-100 


Ratio 
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A more detailed description of each of the study 
variables will be given in the first part of Chapter IV, 
Successive Analysis . 

C. PREPARATION OF THE DATA 

Preparation of the data began with acquiring fifty 
thousand records from the U.S. Army Military Personnel Center 
in Alexandria, Virginia. Initial restrictions on the data 
were established to allow inclusion of only NCO's with a date 
of entry after January 1, 1976. Further, NCO's selected had 
to be members of the Regular Army, and not Reserve or 
National Guard forces. These restrictions provided for 
observation of only those NCO's who were recruited a 
reasonable time period following the ending of the Viet Nam 
War, and following the establishment of the All- Volunteer 
Force. Restricting the NCO's to Regular Army soldiers 
focused the study on the standing forces alone, and avoided 
confounding as a result of different promotion and accession 
policies in the Reserve and Guard Forces. 

The records requested were randomly drawn by taking every 
fifth individual from an estimated population of 250,000 
meeting the above restrictions. The fifty thousand MILPERCEN 
records were then matched and merged with a similar personnel 
database from the Defense Management Data Center ( DMDC) 
Monterey, California. The DMDC database holds additional 
information, including: the ability to distinguish high 
school equivalent certificates holders from actual graduates. 
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the highest year of education of the soldier at time of 
enlistment/ and AFQTP and EIMCAT scores renormed for a 1980 
population . 

After the merging, data records which had missing values 
in any of the critical variables fields were dropped. There 
were approximately ten thousand records missing critical 
data. Following initial analysis of promotion rates, two 
additional restrictions were applied against the remaining 
records . 

First, a grouping of several hundred promotion rates 
showed that individuals had been promoted to the rank of E-5 
at rates which were as high as one promotion per month. 
Cross referencing of service numbers identified this sub- 
group as NCO's who had served in Reserve or Guard units and 
who, for a variety of reasons, had been called for active 
duty. As such, they were allowed by regulation to carry with 
them an accelerated promotion to their former rank. 
Subsequently, a serial number match and elimination was done 
for all NCO's with recent listing as Reserve or Guard status. 

A second source of unusual promotion rates at the E-5 
level became apparent in some of the more technically 
oriented career management fields, the medical field in 
particular. Research into Army special recruitment policy 
indicated that during the early 1980's special provisions 
were made to allow persons with background ability in certain 
technical fields to enter the Army and be promoted to NCO 
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status within six months, or in certain cases to receive NCO 
status immediately following basic training.^ To correct for 
these anomalies, all promotion rates which fell outside the 
maximum time periods considering application of both waivers 
were discarded. 

D. COMPARISON TO TOTAL ARMY STATISTICS 

In this section, selected attributes of the sample data 
set and the complete U.S. Army database are briefly compared, 
with the intent of checking the representativeness of the 
sample set. 

Population attributes such as distribution of sex. Career 
Management Fields, and paygrade were obtained from the 
complete U.S. Army database records consisting of over 

250.000 NCO's. 

As described in paragraph 3.B, the sample data set of 

50.000 selected records had been filtered to contain only 
personnel who entered the Army after 1976. Screening of 
those 50,000 records for completeness of data and uniformity 
of promotion policy, reduced the number in the sample set to 
approximately 38,000. It was prudent then, to check the 
final sample set to see if it retained its representative 
character as a random sample. It should be noted, however, 
that this comparison will not occur for all study variables. 



^ MSG Knopp, NCOIC Defense Management Data Center, West. 
El Estero Drive, Monterey CA 93946. 
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Reasons for this include non-availability of records from the 
MILPERCEN database^ and cases where the statistic was 
produced through computation by the author^ promotion rates 
being the principal example. 

1 . Comparison of Army versus Sample Summary Statistics 

Formal hypothesis testing for means or distributions 
with ANOVA was unavailable due to computational and software 
restrictions. However/ since the intent of this section was 
simply to identify any population shiftS/ and the magnitude 
of those shifts/ observation of summary statistics is assumed 
to be sufficient. Specifically/ the means and the standard 
deviations of four variables were obtained from both the 
entire NCO population data set and the thesis sample data 
set. The percent difference between the variable means was 
computed and expressed relative to the thesis sample data. A 
table of comparative statistics and the percent difference is 
shown in Table II. 



TABLE II 


Total Army vs 


Sample 


Summary 


Statistics 






Total 


Army 


Sample 






Sample Size 


<250, 


000) 


(37,854) 


Percent 




Variable 


Mean 


Std Dev 


Mean 


std Dev 


Difference 




AFQTP 


48.3 


25.2 


53.4 


20.9 


Sample 10% 


> 


SEX 


1.09 


.283 


1 . 12 


.328 


Sample 2.7% 


> 


RACETH 


1.63 


.991 


1.65 


.942 


Sample 1.2% 


> 


PAYGD 


5.75 


.597 


5.27 


.464 


Sample 5.2% 


< 



The three variables AFQTP/ SEX, and PAYGD have 
noticeable changes between the Sample and the Total Army/ 
while the RACETH variable doesn't appear to have been 
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affected much by sampling. A closer look at the discrete 
distributions / and an overall conclusion about differences in 
the two data sets follows. 

2 . Discrete Distributions 

Figures 3.1 and 3.2 illustrate differences in the 
discrete distributions for paygrade and race respectively. 
Both plots are Clustered Bar Charts^ and the percentage of 
each level of the discrete variable for both the Total Army 
and the Sample were plotted next to each other. 



ARMY VS SAMPLE PAYGRADE PERCENTTAGES 



ARMY VS SAMPLE RACE PERCENTAGES 





Figure 3.2 

Observation of the tabular data and bar charts show 
that there are some differences between the two populations. 
Specifically^ the sample contains more lower ranking 
personnel/ slightly more women/ and significantly higher 
AFQTP related scores. The racial make-up of the sample 
appears to be similar. 

The restriction of random sampling to only those persons 
entering the service after 1976 can directly or indirectly 
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explain these differences. First, the lower average paygrade 
is a direct result of promotion policy, in which it is 
impossible to achieve a rank above E-7 in less than ten 
years. Hence, the sample population should be demonstrate a 
lower average paygrade. Secondly, the slight increase in the 
proportion of women might be explained by a general opening 
up of the services to women in the late seventies and early 
eighties. Thirdly, the higher AFQTP is a direct result of 
policy restrictions begun in Fiscal Year 1981, and formalized 
by the 1984 Defense Authorization Act. This placed quality 
constraints on AFQT Category and high school diploma status. 
[Ref. 10:sec 1-0, p.13 Whether these restrictions, or the 
general improvement of social acceptance of the military 
services resulted in this AFQT improvement is a question 
which would require significant study in itself. 

In short then, the sample is different in several ways 
from the total NCO population. It should be noted, however, 
that these results are intentional. The shifts caused by 
restricting the sample to after 1976 are felt to be less 
dangerous to the study than the alternative of including 
soldiers who were accessed during the draft and the era of 
Viet Nam War policies. Finally, it is only a matter of time, 
unless significant changes in accession and promotion policy 
occur, before the character demonstrated by the sample data 
set will constitute the norm for all NCOs . Thus, it is 
concluded that the study sample is satisfactory. 
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IV, 



SUCCESSIVE DATA ANALYSIS 



A, INTRODUCTION 

In this chapter the results of a systematic method for 
data analysis will be reported. This method of analysis 
followed a format which is described by Chambers in Graphical 
Methods for Data Analysis ,[ Ref . 121 This procedure develops 

an understanding of the data, beginning with simple 
univariate descriptive procedures, then progressing through 
several increases in dimensionality of variables, and finally 
into the more complex inferential procedures of model 
building and multivariate regression. An abbreviated outline 
of this procedure is shown below. 

1. Analysis of single variables. 

2. Comparison of variable distributions. 

3. Analysis of paired variables, 

4. Multivariate graphical analysis 

5. Linear Models including; 

a. Simple Regression 

b. Multivariate Models 

In addition to these steps, this procedure will be 
supplemented with several non-graphical measures, such as 
ANOVA, ANCOVA, and several tabular nonparametr ic methods. It 
should be noted that this analysis reports only those 
procedures which are considered an essential step in 
investigation, or whose results provided an observation of 
merit. Many available procedures have not been used in this 
chapter, as a consequence of the data failing to meet 
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distributional assumptions^ and for other reasons which would 
make such analysis inappropriate. During the development of 
this chapter, the results of each level of analysis will 
specify why the next set of analysis procedures was pursued. 
Alternatively, if a popular class of procedures is 
disregarded, the logic for disregarding is explained. 

The objective of detailing this procedure is to present a 
thorough depiction of the nature of the variables, and to 
explain the development of resulting inferences and models, 

B, UNIVARIATE ANALYSIS, 

1 , Dependent Variables 
a, PRATE 

(1) General , The variable PRATE represents the 
raw promotion rate of a particular individual. Numerically, 
it is the total of promotions per month up to the most recent 
promotion , 

(2) Value , The variable PRATE was computed 
using data obtained from the DMCD database. The time to most 
recent promotion in months was found by subtracting the basic 
pay entry date from the date of latest award of rank. This 
number then became the denominator of a ratio having the 
individual's rank, or equivalently, the total number of 
promotions the individual has received, as the numerator: 



Prate 



Individual's Latest Rank 



(Award Date of Latest Rank) - (Date of Entry in Army) 
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Ranks were numerically represented with a score of 5 for 
an E“5 Sergeant, and with 6 and 7 for values of the next two 
ranks. The resulting units of measurement for ,the PRATE 
variable were: units of promotion per month of service. 

(3) Attributes of the Variable . The variable 
PRATE qualifies as a continuous variable with a ratio scale. 
The continuous nature of the variable relies on the fact that 
the number of months service combined with three rank 
structures yields sufficient combinations of values, actually 
190 in all, to use as measures. 

There are some inherent problems with the raw PRATE 
score, since promotion policies are in effect which set 
minimum time thresholds for promotion. Thus, the promotion 
of an individual who is presently an E-5 will be incomparable 
to the promotion rate of an E-7 whose three promotions have 
been affected by the minimum time policy. Generally, the 
minimum time in service between promotions grows as rank 
increases, and more senior soldiers will normally have lower 
raw promotion rates. 

A second source of bias is potentially found in the 
Career Management Field (CMF) of the soldier. Army promotion 
policy is based on a system of minimum performance points to 
be attained within a CMF in order to be considered for 
promotion. Generally, the more technical fields will have 
higher promotion point thresholds than non- technical fields. 

The distribution of the variable PRATE and its summary 
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statistics are shown in Figure 4,1. The shape of the 
histogram is positively skewed^ demonstrating a steep 
ascending slope in the first partitions^ then a generally 
flat shape until just past the median value. After the 
median value^ a gradual downward sloping tail occurs. A 
rough interpretation of this shape is that there appears to 
be a few individuals who are promoted at very fast rates, 
followed by a block of average promotion rates, then a 
diminishing tail of individual promotion rates which fall to 
the right of the seventy-f if th percentile. 



PRATE HISTOGRAM AND STATISTICS 




PRATE 



HISTOGRAM TABLE 



X 


PRATE 


SELECTION 


ALL 


X LABEL 


PRATE 


NO. OF ELEMENTS 


37854 


X MEAN 


0.10946 


STD. DEVIATION 


0.036322 


SKEWNESS 


0.59367 


KURTOSIS 


2.5854 


5-PERCENTILE 


0.061225 


25-PERCENTILE 


0.08 


MEDIAN 


0.10204 


75-FERCENTILE 


0.13514 


95-PERCENTILE 


0.17857 


X MIN. 


0.041667 


X MAX. 


0 . 20833 



Figure 4 . 1 

Distribution transformation of this variable was not 
attempted, primarily because its usefulness in testing or 
modelling is limited by the problems associated with the bias 
factors described above. 
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b. 



RATE 



(1) General > The variable RATE is a re- 
expression of the variable PRATE, It has bias due to 
individual rank removed by normalizing each individual score 
relative to his or her paygrade, 

^2) Values , To compute the variable RATE, the 
average PRATE value for each paygrade was calculated, as well 
as the standard deviation for that paygrade. Individual 
scores were then normalized by the transformation: 

RATEi = PRATEi - AVERAGE for that Rank 
STANDARD DEVIATION THAT RANK 

(3) Attributes of the Variable , The variable 
RATE is also a continuous ratio scale variable, as it is a 
transformation of PRATE, 

The removal of influence due to rank was confirmed by 
computing the correlation coefficient between the variables 
RATE and PAYGD. As seen in Table X, a value of near zero 
resulted where the previous correlation coefficient for PRATE 
and PAYGD had been -,495, Thus, the transformation to RATE 
from PRATE results in a variable independent of PAYGD, 

The distribution shape of the RATE histogram, shown in 
Figure 4.2, appears slightly non-normal, but a check of the 
summary statistics for quantiles show that they correspond 
closely to the standard normal quantiles. Thus, the 
assumption of normality for procedures using this variable is 
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still reasonable^ based on observation of the distribution 
shape and the close agreement of quantile values. 

Figure 4.2 presents a histogram and summary statistics for 
the RATE variable. 



RATE HISTOGRAM AND STATISTICS 




RATE 



HISTOGRAM TABLE 


X 


RATE 


SELECTION 


ALL 


X LABEL 


RATE 


NO. OF ELEMENTS' 


37854- 


X MEAN 


-1 .565E-6 


STD. DEVIATION 


0.99997 


SKEWESS 


0.21408 


KURTOSIS 


2.3767 


5-PERCENTILE 


-1 .5476 


25-PERCENTILE 


-0 . 77578 


MEDIAN 


-0.03757 


75-PERCENTILE 


0.70754 


95-PERCENTILE 


1 . 6234 


X MIN. 


-2.2681 


X MAX. 


3.6685 



Figure 4.2 



c. PRA 

(1) General . The variable PRA is another 
recomputation of the raw promotion rate. PRA controls for 
the career management field as well as paygrade. It is set 
of normalized promotion scores^ which are independent of 
PAYGD and CMF. Verification of the independence of PRA from 
these variables was also confirmed by checking correlation 
coefficients. Both variables CMF and PAYGD had near zero 
values of correlation with PRA. 

(2) Values . Computing the variable PRA was done 
in the same manner as in RATE, however a mean and standard 
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deviation for each CMF and PAYGD combination was computed and 
used in the normalization equation. 



(3) Attributes , PRA is a continuous variable 
with a ratio scale. The distribution of PRA appears normal, 
with the quantile values very close to the standard normal, 
A comparison of percentile values for PRA versus the standard 
normal are shown in TABLE III, 



PRA HISTOGRAM AND STATISTICS 




HISTOGRAM TABLE 



X 


PRA 


SELECTION 


ALL 


X LABEL 


PRA 


NO. OF ELEMENTS 


37S54 


X WEAN 


7.41E-9 


STD. DEVIATION 


0.99881 


SKEV/NESS 


0.21406 


KURTOSIS 


2.6652 


5-PERCENTILE 


"1 .5518 


25-PERCENTILE 


-0.75252 


MEDIAN 


-0.04146 


75-PERCENTILE 


0.69604 


95-PERCENTILE 


1 .7086 


X WIN. 


-3.4988 


X MAX. 


4.5374 



PRA 



F igure 4 , 3 

A comparison of percentiles for the PRA distribution 
versus the standard normal distibution is shown in Table III, 
Specifically, the PRA percentile values are listed with the 
corresponding standard normal percentile values for the same 
data point. For example, -1.5510 is the PRA five percentile, 
while a -1.5510 indexed in a standard normal table results in 
a six percent value. 
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TABLE III 


. Comparison of PRA vs 


Standard 


Normal Percentiles 


PRA 


Standard Normal 


5% 


6% 


25% 


22 . 6% 


50% 


48.4% 


75% 


75.7% 


95% 


96.3% 



Normality for this variable will be assumed based on 
general distribution shape and the close correspondence of 
the data percentiles to the standard normal percentiles. 

2 . Control Variables 

d. SEX 

The variable SEX is discrete and nominal. Males 
are represented by a numerical value of one, and females are 
represented with a two. In the study sample^ 12.29 percent 
of the sample was female, and 87.71 percent were male. 

e. CMF 

Career Management Field (CMF) is a discrete 
variable with nominal scale. Thirty three CMF's are 

represented in the sample. Each Career Management Field is 
assigned a numerical value, for example, the Infantry branch 
is designated as CMF 11. These assignments are a Department 
of the Army numbering system, and can be reviewed along with 
the CMF percentage and frequency table in Appendix A. 

There is some ordinal information in the numbering 
system, for instance, low CMF numbers are indicative of a 
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combat branchy such as Infantry or Armor* Center CMF values 
are indicative of combat support branches, such as Signal and 
Chemical* Upper CMF values are from the combat service 
support branches, such as Medical and Language Specialist* 

Figure 4*4, the CMF histogram, does reflect the 
distribution of the three general groupings of CMF densities: 
combat, combat support, and combat service support* The 
combat and combat support values have roughly equivalent 
representation, while the upper numbered service support 
CMF's are about two thirds the size of the other groups* 



CMF HISTOGRAM 




Figure 4*4 

f* RACETH 

The race-ethnic variable is a discrete, nominal 
variable. The values represented and their percentages are 
shown in table IV* 
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TABLE 


IV Sample Race 


Percentages 




Value 


Race 




Percent 


Cumulative 










Percent 


1 


White 




52.43 


52.43 


2 


Black 




38.59 


91 . 02 


3 


Hispanic 




5.58 


96.6 


4 


American 


Indian/ Alaskan 


Native .26 


96.86 


5 


As ian/ Pacific Islander 


1 .15 


98.01 


6 


Other /Unknown 


1.99 


100.00 



g. PAYGD 

Paygrade is a discrete, nominal variable. The 
selection of NCO rank from personnel enlisting after 1976 
resulted in representation by paygrades E-5 through E-7 only 
The distribution of PAYGD is shown in Table V. 



Value 



5 

6 
7 



TABLE V Sample Paygrade Percentages 



Rank 



Percentile 



Cumulative 

Percent 



Sgt E-5 73.29 
Staff Sergeant E-6 25.89 
SFC E-7 0.81 



73.29 

99.19 

100.00 



The 0.81 percent for E-7 results in only 307 SFC's in the 
sample. Despite the preponderance of representation by the 
other ranks, a sample size of 307 for the E-7 rank still 
allows for adequate representation of that subcategory. 
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3 . Intelligence and Academic Scores 



h. GTSCR 

The General Intelligence Test Score (GTSCR) of 
the individual is a continuous variable with at least an 
ordinal scale. The range of values run from 50 through 160, 
The lower value of 50 represents the corresponding minimum 
score of ASVAB modules that would allow for enlistment in the 
Army. The histogram of the GTSCR variable, shown in figure 
4.5, is approximately normal. Checking the quantiles shows a 
larger density in the distribution to the left of the mean, 
with slightly lower values for quantiles right of the mean. 



GTSCR HISTOGRAM AND STATISTICS 
(N=37854) 




GTSCR 



HISTOGRAM TABLE 



X : GTSCR 

SELECTION :ALL 

X LABEL : GTSCR 

NC. OF ELEMENTS :37£54 

X MEAN ; 108.23 

STD. DEVIATION : 14.275 

SKEWNESS ; 0.129 

KURTOSIS : 3. 3632 

5-PERCENTILE :84 

25-PERCENT ILE :99 

MEDIAN :109 

75-PERCENTILE : 117 

95-PERCENTILE ;130 

X MIN. :54 

X MAX. : 156 



Figure 4.5 
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AFQTP 



i • 

The Armed Forces Qualification Test Percentile is 
a continuous variable with ordinal scale. Its value 
represents the relative standing of an individual's test 
score referenced against a 1944 population. This means that 
an individual's raw AFQT score is compared against a standard 
table of values that was developed in 1944, This table of 
values from 1944 was designed to represent the distribution 
of raw AFQT test scores for the entire 1944 American youth 
population, Hence^ a resulting individual AFQT score is 
simply the corresponding percentile of the individual raw 
AFQAT score relative to the entire 1944 population AFQT test 
distribution , 

The histogram and summary statistics for AFQTP are shown 
in Figure 4,6, The density of AFQTP is partially symmetric 
about the mean. The lower five percent quartile is at a 
value of 21^ demonstrating the restriction applied to CAT V 
and VI personnel since 1980, Use of the AFQT score for this 
study is primarily for comparative reasons, AFQT cannot be 
used in any developed model since scoring against the 1944 
reference population has ceased. As will be seen in 
subsequent chapters / AFQT was discarded anyway when OAFQT 
proves to a better explanatory variable. 
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AFOTP HISTOGRAM AND STATISTICS 




AFOTP 



HISTOGRAM TABLE 



X ' ; AFOTP 

SELECTION :ALL 

X LABEL : AFOTP 

NO, OF ELEMENTS : 37854 

X MEAN : 53. 41 9 

STD. DEVIATION : 20. 965 

SKBVNESS ; 0.299 13 

KURTOSIS : 2.2128 

5-PERCENTILE :21 

25-PERCENTILE :37 

MEDIAN ;50 

75-PERCENTILE : 68 

95-PERCENTILE ;91 

X MIN. :10 

X MAX. :99 



Figure 4.6 

j . OAFQTP 

The OAFQTP variable is a continuous variable with 
ordinal scale. It is fundamentally the same as the AFQTP 
variable, excepting the reference for measurement, which is a 
1980 population. The distribution for OAFQTP is considerably 
more dense in the lower values than AFQTP. Explanation of 
this shift can be seen by reviewing the transformation tables 
in Appendix A for converting 1944-based scores to 1980 
scores. The transformations for values below 80 result in a 
1944 based score to be reduced in almost every case. The 
amount of reduction varies, but it can be as much as four 
points. Only when the scores go above 85 are there any 
increasing transformations. 
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OAFQT HISTOGRAM AN’D STATISTICS 




HISTOGRAM 


TABLE 


X 


;0AFQTP 


SELECTION 


:ALL 


X LABEL 


: OAFQT 


NO. OF ELEMENTS 


: 37854 


X WEAN 


:45.319 


STD. DEVIATION 


: 24.779 


SKEWNESS 


:0. 53139 


KURTOSIS 


; 2. 1725 


S-FERCENTILE 


:14 


25-PERCENTILE 


:25 


MEDIAN 


:41 


75H=ERCENTILE 


:64 


95-PERCENTILE 


:92 


X MIN. 


:1 


X WAX. 


:99 



Figure 4.7 



k. EIMCAT 



EIMCAT is the mental category of an individual 



based on the 1980 reference population AFQT test score. 
EIMCAT is a discrete and ordinal scale variable. The 



assignment of categories is a Department of Defense standard/ 
and is a common reference for all services. The breakdown of 



values is as follows: 



TABLE VI Sample Mental Category Percentages 



Value 

/ 


Cateaorv 


AFQT 


Percent 


Cumulat 

Percent 


1 


Cat 


V 


01-09 


. 33 


.33 


2 


Cat 


IV C 


10-15 


6.736 


7.067 


3 


Cat 


IV B 


16-20 


9.788 


16.854 


4 


Cat 


IV A 


21-30 


19.187 


36.041 


5 


Cat 


III B 


31-49 


26.116 


62.157 


6 


Cat 


III A 


50-64 


13.053 


75.21 


7 


Cat 


1 1 


65-92 


19.99 


95.2 


a 


Cat 


I 


93-99 


4.8 


100 . 000 
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A histogram of the EIMCAT values follows in Figure 4.8.- 



SAMPLE EIMCAT DISTRIBUTION 




Figure 4.8 

Observation of the above figures demonstrates more 
clearly the fact that categorization into EIMCAT category is 
not evenly distributed across the scale of OAFQT scores. For 
example, the center EIMCAT, value five, spans almost twenty 
points, while EIMCAT eight contains only the upper seven 
point scores. EIMCAT does make available an established, 
discrete scale measurement representing intelligence test 
scores for use in appropriate statistical procedures. 

1. HIYRED 

HI YRED is the highest year of education held by 
the individual upon entry into the army. It is a discrete 
and ordinal scale variable. The values and distribution 
percentages are shown on the next page in Table VII. 
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TABLE VII Sample Highest Year of Education 


Value 


Cateqory 


Percent 


Cumulative 








Percent 


1 


1-7 Years 


0.018 


0.018 


2 


8 Years 


0 . 153 


0 . 172 


3 


1 Year High School 


1.397 


1.569 


4 


2 Years High School 


4.7 


6.269 


5 


3-4 years HS (no diploma) 6.935 


13.203 


5.5 


High School GED 


4.813 


18.017 


6 


High School Diploma 


71.274 


89.29 


7 


1 Year College 


3.305 


92 . 595 


8 


2 Years College 


3.453 


96.048 


9 


3-4 Years College (no 


degree ) 1 . 337 


97.385 


10 


College Graduate 


2.560 


99.945 


11 


Masters or Equivalent 


0.05 


99.995 


12 


Doctrate or Equivalent 


0.005 


100.000 




m. EDLVL 








EDLVL is the present level of education for the 


individual. These scores are 


related to HIYRED, in that any 


education taken by the individual subsequent to 


enlistment is 


recorded 


i in this variable. A 


GED equivalency 


is included as 


a value 


of six for high school 


completion . 






TABLE VIII Sample Education Level Percentages 


Value 


Cateqory 


Percent 


Cumulative 








Percent 


1 


1-7 Years 


0.042 


0.042 


2 


8 Years 


0.011 


0.053 


3 


1 Year High School 


0.198 


0.251 


4 


2 Years High School 


0.793 


1.043 


5 


3-4 years HS (no diploma) 1.503 


2.547 


6 


High School Diploma 


80.443 


82.99 


7 


1 Year College 


6.089 


89.079 


8 


2 Years College 


5.828 


94.907 


9 


3-4 Years College (no 


degree) 2.037 


96.944 


10 


College Graduate 


2.948 


99.829 


11 


Masters or Equivalent 


0.1 


99.992 


12 


Doctors or Equivalent 


0.008 


100.000 
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Observation of Figure 4.9^ or percentages in Table VIII^ 
shows an observable upward shift of education level after 
enlistment. This is possible^ and encouraged wi th ' of f icial 
continuing education and high school completion programs. 



HfYRED AND EDLVL PERCENTAGES 




YEARS EDUCATION 



Figure 4.9 



n. NCOE 



The Noncommissioned Officer Education variable^ 
NCOE, is a discrete and ordinal scale variable. It reports 
the level of military schooling accomplished by the 
individual. Military schooling categories are generally 
organized in three ascending levels; primary^ basic and 
advanced. At the two lower levels/ primkry and basiC/ there 
are seperate courses for combat and non-combat CMF's. In 
some caseS/ there has been an award of an On-The-Job Training 
qualification. The OJT award is used to give credit to an 
NCO who can achieve technical competence in advance of being 
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eligible for promotion to the next higher paygrade. 

As previously mentioned^ attendance at military schools 
is sometimes associated with an individual being previously 
identified as a superior performer* This is true mostly in 
the advanced level schools where selection for attendance is 
through Department of the Army Selection Boards* At the 
primary level/ local commanders have authority to establish 
selection procedures and often will make primary school 
attendance a locally mandatory requirement for junior NCOs * 
Table IX and Figure 4*10 demonstrate the categories and 
distribution of NCOE* 





TABLE IX Sample 


NCOE Percentages 




Value 


Cateaorv 


Percent 


Cumulative 










Percent 


0 


Nonparticipant 




21 . 19 


21 . 19 


1 


Primary NCO Course 


(CBT CMF) 


4.46 


25.65 


2 


Primary Leadership 


Graduate 


39.36 


65.25 


3 


On-The-Job Credit 


for E-5 skills 


5.38 


70.63 


4 


Primary Technical 


Course Graduate 


2.82 


73.45 


5 


On-The-Job Credit 


for E-6 skills 


0.0 


73.45 


6 


Basic Technical Course Graduate 


5.11 


78.56 


7 


Basic NCO Course (CBT CMF) 


15.99 


94.55 


8 


On-The-Job Credit 


for E-7 skills 


.01 


94.56 


9 


Advanced NCO Course Selectee 


2.28 


96.84 


10 


Advanced NCO Course Graduate 


3.06 


99.89 


11 


Advanced NCO nongraduate/ OJT 


.01 


99.9 


12 


On-The-Job Credit 


for E-8 skills 


.06 


100.00 



Figure 4*10. presents a histogram of NCOE discrete levels. 
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SAMPLE NCOE SCHOOLING PERCENTAGES 




Figure 4.10 



o . 


PQSCR 












PQSCR 


is a report 


of 


the Primary 


Military 


Occupation 


Skill 


Qualification 


Test 


Score (SQT) 


of the 


individual . 


It is 


a continuous 


and 


ratio- valued 


variable . 


The SQT is 


a service related test 


which 


is used to 


determine 



the technical competence of a soldier. SQT score has been 
used by promotion boards as a qualitative measure for 
promotion. The numerical value represents the percent of 
correct answers on a written and hands-on evaluation. 
Separate SQT tests are written for each CMF^ although the 
structure of the tests are similar. 

The distribution of PQSCR/ shown in Figure 4.11/ is more 
dense in the upper values^ with an abnormally long left tail 
extending to a lower bound of 21. An explanation for the 
shape of the PQSCR distribution is an involved topiC/ and has 
itself been the subject of study. A general observation is 
that PQSCR has previously been used in a manner where 
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individual soldier scores were often aggregated as a means of 
comparison of the parent unit of the soldiers . C Ref . ll:p. 41 
Thus, significant units and individual training emphasis has 
been focused on SQT testing in previous years, and pressure 
to perform well was influenced by the parent organizations. 
As a result, a positively skewed distribution, rather than a 
normal distribution, is understandable. 



PQSCR HISTOGRAM AND STATISTICS 




HISTOGRAM 


TABLE 


X 


.-PQSCR 


SELECTION 


:ALL 


X LABEL 


: PQSCR 


NO. OF ELEMENTS 


: 37854 


X MEAN 


: 78. 384 


STD. DEVIATION 


: 1 1 . 609 


SKEWNESS 


: -0.70832 


KURTOSIS 


: 3. 5739 


5-PERCENTILE 


-.57 


25^ERCENTILE 


:71 


MEDIAN 


:80 


75-PERCENTILE 


:87 


95-PERCENTILE 


:95 


X MIN. 


:21 


X MAX. 


:100 



Figure 4.11 



3 . Summary 

The fifteen variables used in this study demonstrate 
a wide variety of characteristics. All of the dependent 
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variable choices were continuous with two, RATE and PRA, 
showing only slight departures from normality. The other 
continuous variables did not have identifiable distributions/ 
and could not be transformed to normality using power or log 
transformations. Nor is it entirely clear that one would need 
to use a transformed variable in subsequent analysis. 

The independent variables compris of a mixture of 
continuous and discrete values, with both ordinal and ratio 
scales. Within the independent variables there are two 
principal sets of related variables. The intelligence test 
scores, AFQTP, OAFQTP, EIMCAT, and to a lesser extent GTSCR, 
are all derived from the ASVAB. These variables differ from 
one another in varying degrees, and are either a re- 
expression, transformation, or a similarly derived set of 
scores . 

The two academic performance measures, EDLVL and HIYRED, 
are related, in that EDLVL is simply the addition of 
additional schooling since entry into the Army. 

Despite the similarities within these two sets of 
variables, it is felt that sufficient differences in 
informational value are present in each expression. Further, 
since the variables used are all standard data collection 
items for the DMDC database, each variable expression will be 
studied. The relative merit of any single or combined 
variable from this study may be useful to managers seeking 
appropriate data sources for other studies. 
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An important result of the analysis of these study 
variables is the observation that many of the necessary 
assumptions for standard parametric hypothesis testing. 
Analysis Of Variance (ANOVA), and possibly regression will 
not be met. These include assumptions about the form of the 
distribution as well as the scale of the variable. In this 
study, analysis will initially seek to use standard 
parametric methods. However, if results of the analysis are 
sensitive to distributional or scale assumptions, those 
assumptions will be checked. If examination of assumption 
requirements fails, or if there is a nonparametric test of 
similar efficiency, nonparametric tests will be conducted as 
a replacement or as a confirmatory precedure. 

C. BIVARIATE ANALYSIS 

This section will concentrate on identifying 
relationships between pairs of variables, and in identifying 
shifts in distribution as a function of the effects, or 
categorical, variables. Three methods of analysis will be 
used in this section. The first method is analysis of 
association using a matrix of Pearson product-moment 
correlations. This will provide intital information as to 
the strength of association between any two variables, and 
the direction of that relationship, being either positively 
or negatively correlated. The second method will be analysis 
of scatterplots of pairs of variables, using the techniques 
of LOWESS and Jittering to better view any trends in the 
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variables . 



This method will give initial information on what 



type of fitted line^ and hence what mathematical 
relationship exists between independent and dependent 
variables. Of significant interest will be whether the 
relationship is fundamentally linear^ or whether it is 
possibly polynomial or curvilinear. The third and final 
method used will be analysis of three-dimensional empirical 
distribution plots. This will demonstrate some shifts in 
distribution within several of the effects variables. 

1 . Correlation Matrix 

As earlier mentioned^ the purpose of reviewing the 
Pearson product-moment correlation matrix is to identify 
pairs of variables which have a strong association. The 
range of the correlation coefficient^ rho^ is from -1 to -*-1/ 
and a value of zero indicates that the variables have no 
linear association with each other. A value of +1 indicates 
an exact direct linear relationship/ while a -1 indicates an 
exact inverse linear relationship. This measurement of 
association is not completely indicative of dependency, and 
is only a preliminary tool to identify candidate variables 
for testing and subsequent inferential statistics. 

Remembering the central question of this thesis, the most 
important pairs of variables will then be any of the 
intelligence and academic scores paired with the promotion 
rate variables. Of almost equal interest will be any 
interval scale effects variables demonstrating a strong 
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linear relationship with the promotion variables. 

The strength of the linear relationship between two 
variables, or its level of significance, is based on how much 
variance there is in the estimated value of rho. Further, 
the variance of rho is dependent on the sample size being 
considered. For example, if the sample size were small, and 
the value of rho had a standard deviation of plus or minus 
.3, then a large positive or negative value of rho would be 
needed to effectively demonstrate significance. Conversly, 
for a large sample set with very small standard deviation for 
rho, a much smaller rho value could be considered 
significant. An estimate for the standard deviation of rho 
can be found by computing the inverse of the square root of 
the sample size. Considering the thesis sample size of 
37,854, the resulting estimate of the standard deviation of 
rho is .005139. Thus, a value of rho different from zero by 
plus or minus .01, could be considered significant. 

In Table X the complete Pearson product -moment 
correlation matrix for the study variables is given. The 
Pearson product-moment computation is a parametric method and 
assumes pairs of normal and continuous variables. This is 
the preferred method since we are primarily interested in 
correlations with either the RATE or PRA variable as one of 
the pair of variables. Additionally, it is possible, using 
the Spearman nonparametr ic method, to compute a correlation 
value rho for pairs of ordinal, or higher scale variables. 
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[Ref. 13:pp. 251'*253] The Spearman method is a distribution 
free method providing correlations based on the ranks of the 
variables. The last column on the second part of Table X 
lists the correlations computed using the Spearman method. 
Comparison of Spearman versus Pearson values showed that 
there was an acceptable correspondence between the two 
methods, and Pearson values are used exclusively to simplify 
analysis . 

Even with application of both the Spearman and Pearson 
methods there remained several pairs of variables which did 
not meet the assumed distributional characteristics for 
correct interpretation of the rho value. These variables are 
the discrete, nominal variables SEX, RACETH, and possibly 
CMF. Their results are included in Table X, but any 
interpretation of the rho value would be ineffective. The 
most important rho values in Table X are located under the 
PRA column and are underlined. 
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TABLE X 
PRATE RATE 


Pearson 

PRA 


Correlation i 
GTSCR AFQTP 


Coefficients 
OAFQTP EIMCAT 


PQSCR 


PRATE 1 


. 000 


.822 


.790 


. 035 


. 100 


. 177 


. 174 


.039 


RATE 


. 822 


1.000 


. 951 


.118 


. 155 


.209 


. 200 


. 101 


PRA 


.790 


. 951 


1.000 


. 107 


. 133 


. 177 


. 170 


. 094 


GTSCR 


.035 


.118 


. 107 


1.000 


.741 


.734 


.689 


. 274 


AFQTP 


. 100 


. 155 


. 133 


.741 


1.000 


.937 


. 903 


. 308 


OAFQTP 


. 177 


. 209 


. 177 


.734 


. 937 


1 . 000 


. 955 


.315 


EIMCAT 


. 174 


. 200 


. 170 


.689 


. 903 


.955 


1 . 000 


. 305 


HIYRED 


. 156 


. 168 


. 177 


.210 


.215 


. 245 


. 209 


. 066 


EDLVL 


. 085 


. 139 


. 162 


.266 


.257 


. 266 


. 241 


. 100 


NCOE - 


. 200 


.047 


.006 


.039 


-.009 


-.060 


- . 062 


.093 


SEX 


.013 


-.019 


.036 


.055 


. 159 


. 050 


.062 - 


.013 


CMF 


. 074 


- . 143 


.000 


.113 


. 106 


.074 


.067 - 


.042 


RACETH- 


. 064 


-.084 


-.057 


-.242 


-.305 


- . 325 


-.314 - 


. 128 


PAYGD - 


. 495 


.000 


. 000 


. 143 


. 087 


. 031 


.023 


. 097 


PQSCR 


. 039 


. 101 


.094 


.274 


. 398 


. 315 


.305 1 


.000 



PEARSON 


COEFFICIENTS 


CONTINUED 




SPEARMAN 


PAYGD 


HIYRED 


EDLVL 


NCOE 


SEX 


CMF 


RACETH 


PRATE 


PRATE - 


.495 


. 157 


. 085 


- . 200 


. 013 


- .075 


-.064 


1.000 


RATE - 


.000 


. 168 


. 139 


. 047 


- . 018 


- . 142 


- .084 


.808 


PRA 


.000 


. 178 


. 162 


. 005 


.036 


. 000 


-.056 


.777 


GTSCR 


. 143 


.210 


. 265 


.039 


. 054 


. 113 


-.242 


. 020 


AFQTP 


.087 


. 215 


.258 


- . 009 


. 159 


. 107 


-.306 


.075 


OAFQTP 


.031 


.245 


.266 


-.060 


.049 


.074 


- . 325 


. 165 


EIMCAT 


.023 


. 209 


. 242 


- . 062 


.063 


.068 


-.313 


. 158 


HIYRED 


.001 


1.000 


.708 


- . 063 


. 131 


. 146 


.024 


. 147 


EDLVL 


.098 


.708 1 


. 000 


. 004 


.114 


. 177 


.039 


. 038 


NCOE 


.433 


- .063 


.004 


1 . 000 


-.081 


- . 184 


.015 


-.208 


SEX 


.057 


. 131 


.114 


-.081 


1 . 000 


. 258 


.042 


. 020 


CMF 


.053 


. 146 


. 177 


- . 184 


. 258 


1.000 


.025 


-.069 


RACETH- 


.016 


.024 


. 039 


. 015 


. 042 


.025 


1.000 


- . 092 


PAYGD 1 


.000 


. 000 


.098 


. 432 


-.056 


-.054 


-.016 


-.535 


PQSCR 


.097 


.066 


.100 


. 093 


-.013 


-.042 


-.128 
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The most significant observations from the tables are 
summarized as follows: 

For the variable RATE there is zero correlation with the 
PAYGD variable. Thus, the transformation of PRATE to RATE 
did remove the influence of paygrade on promotion rate. 
Similarly, for the variable PRA, both PAYGD and CMF have zero 
correlation . 

As expected, the three promotion rate variables are all 
highly correlated in a positive direction. 

With two exceptions, the correlation values for the 
effects and independent variables have similar magnitudes and 
signs across all three expressions of promotion rate. The 
first exception is the NCOE variable. Under PRATE it is 
negatively correlated with a value of 0.2, and positively 
correlated with lower values for RATE and PRA. This result 
makes sense when one considers that NCOE is highly correlated 
with PAYGD, (0.565). Specifically, raw promotion rates are 
lower for higher grade NCO's due to time in service and time 
in grade requirements, (-.495). Hence, NCOE, which is highly 
correlated with PAYGD, will also reflect that inverse 
relationship. When the influence of paygrade is eliminated, 
as it is in RATE and PRA, this negative correlation is 
incidentally removed . 

The second exception is for the variable SEX where it is 
positive signed for PRATE and PRA, but negatively signed for 
RATE. The magnitude for all three values are close to zero. 
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An explanation for the difference in sign between PRA and 
RATE will be presented in the analysis of empirical 
distributions and coded scatterplots * 

Groups of closely related variables have generally the 
same correlation across the three promotion variables. 
Specifically, AFQTP, OAFQTP, EIMCAT, and to a lesser extent, 
GTSCR, all demonstrate a strong positive correlation against 
each other, and show the same trend when compared against the 
promotion rate variables. The academic variables HIYRED and 
EDLVL demonstrate similar characteristics, however, EDLVL is 
weaker than HIYRED with respect to the promotion rate 
variables . 



Considering RATE 


and 


PRA as 


the 


better 


promotion 


variables to model with. 


and 


allowing 


for 


only one 


variable 



from each of the related groups, the six most significant 
correlated variables were selected. These variables, listed 
in descending absolute value of rho, are shown in Table XI. 



TABLE XI Most 


Significant ' 


Correlated Variables 


Considering both i 


RATE and PRA 


Variable 


Rho Value 


HIYRED 


approx 


0 . 17 


OAFQTP 


approx 


0 . 14 


GTSCR 


approx 


0.10 


PQSCR 


approx 


0.09 


RACETH 


approx 


-0.06 


NCOE 


approx 


0.006 



These variables, paired either with RATE or PRA, were 
used as the starting basis for multivariate regression 
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analysis . 



The effects variable SEX was included for 



subcategory analysis in an effort to detect any influence it 
might have on the primary relationships. 

2 . Paired Scatter Plots and Simple Regression 

Plots of paired independent and dependent variables 
were implemented to accomplish two purposes. The first 
purpose was to visually search for any dominant plotting 
patterns. Since the rho values found in the previous section 
are designed to detect only linearity^ it is quite possible 
that nonlinear relationships could exist between the 
explanatory and dependant variables. For example^ if the X-Y 
relationship was strictly Y=X^ , a computed rho value should 
be zero. Thus, if one relied only on correlation 
coefficients to detect relationships, he would be misled into 
thinking that no relationship existed between the two 
variables. Simply plotting X-Y scatterplots of the 
explanatory variables with the promotion variables did not 
require specification of the response of the dependant 
variable. Visual observation could then be relied upon to 
detect dominant patterns of any form. These scatterplots 
used two special procedures, LOWESS and Jittering, which will 
be described in analysis of Figures 4.12 and 4.13. 

Secondly, simple least squares regression was performed 
for all variables which had been previously found to be 
significantly correlated. The simple least squares 
regression procedure yielded a value called the Coefficient 
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of Determination^ or R2 (R-square). R2 is mathematically 

related to the rho, and in the one variable case, the square 
of rho is equal to R2* Thus^ R2 can also be used to 
qualitatively interpret the strength of linearity for a 
simple linear model. The advantage of producing R2 values 
was that R2 directly represents the proportion of variance 
accounted for by the assumption of a linear model. The 
results for each of the regressions and an explanation of R2 
will be discussed in analysis of Table XII. 
a. Paired Scatterplots 

Since interpretation of the correlation 
coefficients assumes linearity^ visual analysis of pairwise 
scatterplots was used to search for observable patterns^ 
linear or otherwise. This visual approach did not require 
interpretation of single derived parameters to identify any 
patterns . 

In producing the scatterplots the LOWESS procedure was 
used. LOWESS^ which stands for^ Locally Weighted Regression 
Scatter Plot Smoothing, CRef. 12:pp 94-95] is a nonparametr ic 
smoothing procedure which is designed to estimate functional 
relationships between Y and X. In particular, no linear or 
quadratic relationship is assumed. For scatterplots of 

discrete variables against the continuous promotion rate 
variables, the discrete variables were Jittered to overcome 
repeated plotting of points. Jittering involves generating 
small random increments, which are then added to the X 
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values. As a result, when the X-Y plot is performed fewer X 
values are repeatedly plotted in the same location, and a 
better visual interpretation can be made of the quantity of X 
values at a discrete level. 

The overall results of the LOWESS plots showed that the 
predominant pattern was indeed linear. Further, the linear 
pattern was demonstrated most clearly between pairs of highly 
correlated variables. Figures 4.12 and 4.13 demonstrate that 
linearity and the LOWESS and Jittering techniques 
respectively. As a result, linear modelling techniques were 
considered to be the best choice for subsequent analysis. 

LOWESS SCATTERPLOT OF HIYRED VS PRA 

LOWESS PLOT OF PRA VS OAFQTP 




Figure 4.12 Figure 4.13 

b. Simple Regression 

For pairs of significantly correlated variables, 
a simple least squares regression plot using PRA as the 



67 



independent variable was accomplished. The simple least 
squares regression for pairs yields quantitative results in 
terms of slope values / intercept values ^ tests of the slope 
and intercept values, and the R2 value. 

The R2 value represents what proportion of total variance 
was explained by the simple linear model. As such, its 
values range from zero to one. An R2 value of zero would 
indicate that a linear model does not account for any 
variance of the dependent values. Correspondingly, a value 
of zero would be the estimate of the slope of the line. The 
significance of R2, like rho, is related to sample size. To 
determine the significance of a R2 value, the results of the 
T test for the slope of the model are checked. If the T 
statistic is large and the probability of a greater T value 
small, a null hypothesis of a slope of zero is strongly 
rejected. Thus, we can be confident of the linearity of the 
model and the derived slope estimate. Sample size is 
considered in this test because the T statistic is computed 
as a function of sample size. Thus, even with a small R2 
value, if the T test for the slope were significant, the R2 
value would necessarily be held as significant. The only 
qualification for a low R2 value would be that there exists 
considerable 'noise' or unaccounted variance in the response 
of the dependent variable. A summary of results are shown in 
Table XII. 
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TABLE XII Simple Least Squares Summary Data 
using PRA as Dependent Variable 

Variable Intercept Std Err Slope Std Err R2 T 



GTSCR 


-0.856 


(0.0061 ) 


0 . 008 


(5 . 6E-04) 


• 013* 


13.8 


AFQTP 


-0 . 338 


(0.014 ) 


0 . 006 


(0.0002 ) 


,018* 


26 . 1 


OAFQTP 


-0.336 


(1 .6E-02) 


0.007 


(3.2E-04) 


, 033* 


22.5 


EIMCAT 


0 . 004 


(0.027 ) 


-0 . 003 


(0.005 ) 


, 000 


- . 5 


HIYRED 


-0 . 005 


(0.047 ) 


-0 . 001 


(0.008 ) 


. 000 


- .2 


EDLVL 


0.011 


(0.054 ) 


-0 . 003 


(0.008 ) 


. 000 


- . 02 


NCOE 


-0 . 020 


(0.021 ) 


0 . 003 


(0.003 ) 


. 000 


1 . 1 


SEX 


0.011 


(0.028 ) 


-0.018 


(0.024 ) 


. 000 


- .7 


CMF 


-0.023 


(1 .6E-02) 


0.000 


(2.6E-04) 


. 000 


. 9 


RACETH 


-0 . 009 


(0.018 ) 


-0 . 001 


(0.010 ) 


, 000 


- . 1 


PAYGD 


-0 . 045 


(0.093 ) 


0.007 


(0.018 ) 


. 000 


. 3 


PQSCR 


-0.059 


(5.4E-02) 


0.007 


(6.9E-04) 


, 008 * 


10.6 



Important observations from the simple paired regression 
analysis are summarized in the following paragraphs. 

Very few sets of pairs result in a significant R2 value. 
Those that do are: GTSCR, OAFQTP, and PQSCR. All three of 
these variables have a positive slope. Analysis of residuals 
for these pairs did show reasonable normality of residuals 
and did not demonstrate any lack of homoscedastici ty . 

The remaining variables have a low value positive or 
negative slope. For each of these variables, the 95% 
Confidence Interval for the slope shows the upper or lower 
value of the slope to be either positive or negative. Thus, 
no observable ascending or descending relationship can be 
claimed . 

Using the variable RATE as the independent variable in 
the simple regressions results in the variables EIMCAT and 
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AFQTP having measurable R2 values and positive slopes. 

As expected, the results of the simple regression 
analysis coincide with observations taken from the 
correlation table . 

When considered one at a time, there appear to be only a 
handful of variables demonstrating a reportable relationship 
with the promotion variables. The low R2 value for each 
regression indicates either a large proportion of pure error, 
or significant unexplained variance due to other explanatory 
variables not being included. 

3 . 3-D Empirical Density Plots 

Three dimensional empirical density plots were used 
to visually check for distribution changes in the continuous 
variables within the subcategories of SEX, PAYGD and RACETH. 
Two such plots will be discussed because they depict visually 
data characteristics identified in earlier tabular results. 
These characteristics were: the application of AFQT 
restrictions by congressional mandate in 1980, and the 
differences in OAFQT scores across racial groups. 

The AFQT restriction is depicted in Figure 4.14, where 
empirical densities for OAFQT are plotted for each paygrade. 
Observing the three densities shows that only the E-7 
paygrade distribution contains scores less than twenty. This 
makes sense, considering that all the E-7 enlistments were 
prior to 1980. Another interesting observation from this 
plot is that high OAFQT scores become more dominant as 
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paygrade increases. This is most apparent in comparing the 
E-7 density to either the E-S or E-6, This shift in density 
of OAFQT across the three pay grades suggests that attrition 
tends to manifest itself in the lower AFQT caetgories/ but 
that a low AFQT score is, in itself/ not prohibitive in 
achieving senior enlisted rank. 

The second 3-D empirical density plot/ Figure 4.15/ shows 
the differences in renormed AFQT scores across racial 
subcategories. A large discrepancy between the white and the 
distribution of black or hispanic races is easily seen/ 
although Indians have a similar AFQT to that of whites. This 
observation coincides with the occurrence of different 
promotion rates between different racial categories as well. 
However/ to make inferences about promotion policy among 
races would require further research. As pointed out by 
Daula/ [Ref. ll:pp. 7-10] the attrition pattern among 
different racial groups shifts the averages for both 
promotion rate and AFQT among the races over time. Since the 
purpose of this thesis, is one of prediction/ it is more 
important to identify the effect and account for it in the 
model. An explanation as to the cause of this phenomenon 
does not appear to be easily obtained from the thesis data. 

What is important about this plot is that it visually 
demonstrates the correlation between RACETH and OAFQT. If 
OAFQT is a significant determiner of promotion rate, then 
RACETH will be an important covariate. 
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3-D EMPIRICAL DENSITY PLOT 
OAFQT BY PAYGD 




Figure 4.14 

3-D EMPIRICAL DENSITY PLOT 
OAFOT BY RACETH 

>- 




D. MULTIVARIATE GRAPHICAL ANALYSIS 



Multivariate graphical analysis consisted of the use of 
Draftsman Plots and Coded Scatter Plots to look for 
relationships when more than two dimensions were under 



consideration. [Ref. 12:pp. 



135-139] 



One of these 



procedures, the Coded Scatterplot, will be utilized to 
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demonstrate a significant data characteristic, that 
characteristic being the distribution of SEX, correspondent 
to CMF and PRA, in Figure 4.16. 

Coded Scatterplots involved delineating one of the 
effects variables as a third dimension, while plotting an 
independent variable against a dependent promotion variable. 
In Figure 4.16, CMF values were Jittered and plotted against 
the PRA variable, and the plot points were coded as periods 
for males and the letter F for females. 

CODED SCATTERPLOT 
PRA VS CMF WITH SEX 
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Figure 4.16 demonstrates the 


higher 


derrsity of female 


personnel in the 


upper . 


CMF range. 


which 


contains the more 


technically oriented car 


eer management 


fields. This 



corresponds to the CMF-SEX correlation coefficient of 0.250 
found in Table X. Likev/ise, the dis tribiit ion of both the 
female and male PRA scores are symmetric about the zero line. 
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This corresponds to the zero value for the PRA-SEX 
correlation coefficient also found in Table X. 

E. LINEAR MODELS 

1 • Analysis of Variance 

One Way ANOVA was used in this thesis as an 
intermediate step in defining a final inference model. 
ANOVA's usefulness has been as an investigative tool to 
detect differences in means among classes of explanatory 
variables. For example, using PRA as the dependent variable 
and EIMCAT as the independent variable, One-Way ANOVA will 
compare and test the equality of the average PRA score across 
the eight levels of EIMCAT, i.e., mental categories one 
through eight. In the testing, the null hypothesis is that 
all eight mental category PRA means are equal, while the 
alternate hypothesis is that they are not. The test 
statistic used to reject or accept the null hypothesis is the 
F statistic. As such, a large F value, and subsequent 
rejection of the null hypothesis would indicate that there 
exists significant differences between the means of the 
promotion scores for some of the eight mental categories. In 
general, a large F value can be considered to be any computed 
F statistic greater than 3.8, the asymptotic 95 percent point 
for a one degree of freedom model . The nature of these 

differences could be a large discrepancy between a simple 
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pair of categories^ small discrepancies between all eight 
categories^ or any combination of difference conditions. 
Thus, ANOVA has limited value in discerning the location and 
magnitude of the differences between category means, but it 
does identify if differences exist and how strong those 
differences are. 

Table XIII tabulates a twelve by three matrix of results 
for separate One-Way ANOVA's, The rows are the twelve 
explanatory variables and the columns are the three promotion 
variables. Using all three promotion measures as the 
independent variable allowed for a check of ANOVA values and 
trends across those measures. 

In addition to the results of the F test, a value of R2 
is reported. This R2 value is different than that reported 
in the simple linear regression model. This is because the 
ANOVA procedure considers the independent variable as a set 
of levels, rather than a single continuous variable. With 
One-Way ANOVA, all variables had some level of R2 reported. 
Further, because of the increased informational value of 
variable categories, and hence, more degrees of freedom for 
computation, the values of R2 increased above the simple 
regression reported values. 

It should be noted that technically, when the defined 
continuous variables were put into ANOVA, their values were 
grouped, and then the variables were treated as if they were 
discrete. Because the SAS software and computational 
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resources used could handle all the integer values for the 
score ranges of AFQTP and the other continuous variables^ it 
was possible to gain insight into the existence of 
differences between individual score cells. 

Additionally^ nonparametr ic procedures were used to 
evaluate the relationships. [Ref. 13:pp. 250-2553 The 
nonparametr ic ANOVAs utilized the ranks of the variables and 
also yielded the F statistic for testing the hypothesis of 
equal level means. Having agreement between the parametric 
and nonparametr ic values removed the need of having to pursue 
confirmation of assumptions for parametric ANOVA. It will 
also allow analysis of results to focus on the resultant 
values of F and R2 tabulated in Table XIII. 





TABLE 


: XIII 


One-Way Anova Summary 




Variable 


PRATE 


RATE 


PRA 




F 


R2 


F 


R2 


F 


R2 


SEX‘ 


5.9 


.00016 


13.3 


.00351 


48.4 


.00128 


CMF* 


35. 


.02788 


93.3 


.07415 


0.0 


. 00000 


RACETH 


90. 


.01177 


165.0 


.02133 


80.0 


. 01049 


PAYGD* 


6292. 


. 24953 


0.0 


.00000 


0.0 


.00000 


GTSCR 


18 . 


.04250 


13.4 


.03184 


10.9 


. 02636 


AFQTP 


32. 


.07046 


20.6 


.04623 


17.3 


.03908 


OAFQTP 


36. 


.08441 


25.3 


. 06101 


19. 


. 04657 


EIMCAT 


37. 


. 01076 


71 . 5 


.02035 


96.9 


.02739 


HIYRED 


96. 


.02950 


106 . 0 


.03272 


117. 


.03590 


EDLVL 


37. 


.01076 


71.5 


.02035 


96.9 


. 02739 


NCOE 


156. 


.05097 


76.4 


.02499 


46.8 


.01583 


PQSCR 


1 . 9 


.00375 


6.6 


.01341 


5.8 


.01181 


» The 


Pr>F (level of 


rejection 


of the 


null hypothesis 


of no difference in means) was . 


0145 for 


PRATE, 


.0003 for 


RATE and .0001 


for PRA. 










2The 


Pr>F for PRA is 


1.0. 








3 The 


Pr>F for RATE is 1.0, and for PRA is 1.0 


• 


Values 


of Pr>F 


for the 


remainder 


of the 


table were .0001. 
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Review of the Table XIII demonstrates some anticipated 
results/ which are summarized in the following paragraphs. 

Since the variables PAYGD and CMF were controlled for in 
the derivation of PRA, there is correspondingly no 
relationship between those variables and the PRA promotion 
variable. Likewise, the variable PAYGD was controlled for in 
the derivation of RATE, and there was no linear relationship 
demonstrated for that pair. The zero values for the F 
statistic and R2 for those variable combinations documents 
this fact. 

Using RATE or PRA as the dependent variable, and allowing 
for only one, most significant variable to be selected from 
each of the intelligence and academic groups, results in the 
same set of explanatory variables as were found in 
correlation analysis. These variables were: HIYRED, OAFQTP, 
GTSCR, PQSCR, RACETH, NCOE, and SEX. The most significant 
variables were the ones which had the larger F statistic, and 
R2 value. This set is not ordered, however, since there are 
differences in order between the PRA and RATE models. 

Another interesting development from ANOVA results when 
the explanatory variable mean and variance for each level are 
plotted against the promotion variable. This not a standard 
analytical plot, but it does provide some visual information 
on the size, direction, and dispersion about the center line 
of an independent discrete variable. This plot is most 
similar to a strip box plot for continuous variables. 
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An example plot where each individual's PRA score was 
plotted against the sum of his EIMCAT and HIYRED score is 
shown in Figure 4,17. In Figure 4.17 the two center lines 
plotted represent the sum of scores for EIMCAT and HIYRED 
s operated between the GED qualified personnel and High School 
Diploma Qualified personnel. The outside two lines trace the 
upper and ’ lower bounds one standard deviation from the 
computed means. 



X-Y PLOT OF MEANS AND VARIANCES 




Figure 4 . 17 

By plotting a separate line for each high school diploma 
category it can be seen that while both groups have a similar 
increase in promotion rate, as the combined level of EIMCAT 
and HIYRED increased, the GED qualified personnel were 
consistently a fixed level lower than a fully qualified high 
school graduate. Thus, the additional merit of an actual 
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high school diploma did manifest itself in promotion rate. 

A final look at ANOVA involves specifying a model using 
the set of the seven most significant independent variables, 
and then checking for interactions among them. Table XIV 
gives the results of the Seven-Way ANOVA using this model: 

RATE = 7 Main Effects + Two Way Interactions 

Table XIV depicts the seven most significant variables 
individually in the Main Effects rows, and the interaction 
terms in the Interactions rows. 

The advantage of this Seven-Way ANOVA is that inclusion 
of all of the explanatory variables simultaneously allows for 
comparison of the significance of each of the explanatory 
variables relative to the others. Additionally, specifying 
combinations of two-way interactions checks to see if any two 
of the explanatory variables are significantly related to one 
another. An example of an interaction would be a SEX and CMF 
term. As has been previously shown, female personnel tend to 
be associated with higher CMF values. If the ANOVA model for 
promotion included a term which was the product of the two 
values, SEX*CMF, then the two attributes would be jointly 
considered in the ANOVA model. If the interaction term was 
found to be significant, then the two individual variables 
entries for CMF and SEX would be removed and only the 
interaction term retained. 

An additional consideration in the Seven Way ANOVA was 
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that the model was unbalanced. Unbalanced means that there 



were some combinations of the factor levels which did not 
have any entries in the ANOVA cells. An example of this can 
be seen in the SEX*OAFQT term. Specifically, there are only 
76 degrees of freedom for the interaction term, while the 
individual degrees of freedom for SEX and OAFQT are 1 and 79 
respectively. Thus, the SEX*OAFQT term had three 
combinations without entries. As a result, the F statistic 
computed will be only approximate. Since the purpose of this 
step in analysis was exploratory, the F statistic estimates 
were considered adequate. 

Table XIV presents the results of a Seven Way ANOVA using 
RATE as the dependant variable. Similar results were 
obtained using PRA as the dependant variable. 
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TABLE XIV 


7-Way Analysis of Variance with Interaction 


DEPENDENT VARIABLE: 


RATE 








SOURCE DF 


SSQ 


MEAN SQUARE 


F VALUE PR > 


F R2 




MODEL 14966 18869 


.39 1.260818 


1.52 0.0001 0.49852 


ERROR 22887 18981 


.65 0.829364 








CORRECTED 






ROOT MSE 






TOTAL 37853 


37851 . 


04 


0 .91069421 




SOURCE 


DF 


ANOVA SS 


F VALUE 


PR > F 




Main Effects 
RACETH 


5 


807 . 35 


194.69 


0 . 0001 




SEX 


1 


13.28 


16.02 


0 . 0001 




OAFQT 


79 


1670 . 54 


25 . 50 


0.0001 




HIYRED 


12 


1238 . 25 


124.42 


0 . 0001 




GTSCR 


93 


1205 . 22 


15.63 


0.0001 




NCOE 


13 


945.89 


87.73 


0.0001 




PQSCR 


78 


507.52 


7.85 


0.0001 




Interactions 

RACETH*SEX 


5 


0 . 00 


0.00 


1.0000 




SEX*OAFQT 


76 


440 . 59 


6.99 


0.0001 




SEX*HIYRED 


9 


66.03 


8.85 


0.0001 


k 


SEX*GTSCR 


72 


72.80 


1.22 


0 . 0999 




SEX*NCOE 


11 


57.76 


6.33 


0 . 0001 


k 


SEX* PQSCR 


70 


53.06 


0 .91 


0.6795 




RACETH*OAFQT 


335 


0.00 


0.00 


1.0000 




RACETH*HIYRED 


46 


107.84 


2.83 


0 . 0001 


■k 


RACETH*GTSCR 


326 


0.00 


0.00 


1.0000 




RACETH*NCOE 


46 


8.41 


0.22 


1 . 0000 




RACETH*PQSCR 


288 


104.24 


0.44 


1.0000 




OAFQT*HIYRED 


593 


112.62 


0.23 


1.0000 




OAFQT*GTSCR 


2864 


2418.55 


1.02 


0 . 2570 




OAFQT*NCOE 


614 


954 . 24 


1 . 87 


0.0001 


-k 


OAFQT*PQSCR 


3631 


3182.33 


1.06 


0.0137 




HIYRED*GTSCR 


564 


130.88 


0 . 28 


1 . 0000 




HIYRED*NCOE 


88 


276.98 


3.80 


0 . 0001 


■k 


HIYRED*PQSCR 


518 


484.13 


1 . 13 


0 . 0251 




GTSCR*NCOE 


604 


718 . 86 


1.44 


0.0001 


k 


GTSCR*PQSCR 


3383 


2997.93 


1.07 


0.0051 




NCOE*PQSCR 


542 


504.44 


1 . 12 


0 . 0268 





Three important observations can be obtained from Table 
XIV. The first observation is that there are few significant 
interaction terms. Only those terms marked with an asterisk 
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demonstrated statistical significance with the PR > F at 
level •0001, Of these, only three had F values greater than 
3.8. These interaction terms were OAFQTP, HIYRED, and NCOE, 
all interacting with SEX. The presence of interation seen in 
the Seven-Way ANOVA model was previously observed in the 
correlation matrix. Table X, where SEX was positively 
correlated with HIYRED and OAFQTP, (0.05, and 0.131 
respectively), and negatively correlated with NCOE, (-0.081). 
The implication of having significant interaction terms is 
that they would need to be included in any predictive model. 
Thus, identification of interactions using ANOVA was 
critical . 

Secondly, all the main effects variables continue to be 
significant, even when used simultaneously by the model. 

Lastly, selecting the single most significant explanatory 
variable from the academic and education groups yields the 
same unordered best set as did the One-Way ANOVA: OAFQTP, 
HIYRED, GTSCR, NCOE, RACETH, and SEX. 

In summary, the fundamental result of ANOVA was the 
confirmation that there are differences in the level means of 
promotion scores due to several independent explanatory 
variables, and an agreement as to which were the best 
explanatory variables when considered separately or 
simultaneously . 

Also, plotting the means and variances of the sum of 
EIMCAT and HIYRED versus PRA demonstrated that there was a 
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good increasing linear trend of the level means with PRA. 
However/ there was considerable variance within each class 
level. The choice of EIMCAT and HIYRED as the explanatory 
variables was important because those variables are both 
discrete representat i ves from the academic aptitude and 
education groups, 

2, ANCOVA 

The use of One-Way Analysis of Variance in the 
previous section was primarily to confirm the existence of 
significant differences among the levels of the independent 
variables. Beyond acknowledging that there are some 

independent variables available to explain promotion rates, 
Seven-Way ANOVA did not provide any numerical measure of the 
structural form of the contribution of a given independent 
variable to the model, [Ref, 14:p, 101 In addition/ in 

analysis of the continuous variables, the nature of the 
variable was changed to represent a discrete valued variable. 
Incorporating continuous variables into ANOVA was 
achieved through the intermediate method of ANCOVA, ANCOVA 
utilizes metric continuous variables as well as nonmetric 
qualitative values. The result of ANCOVA was an improved 

multivariate model with the inclusion of continuous variables 
in their proper form, ANCOVA provided estimates of the 

linear coefficients for the continuous variables, and 
reported on the proportion of variance accounted for by each 
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categorical variable as well. These results provided the 
basis for further removal of variables or interactions from 
the set previously identified. CRef. 15: pp. 343-349] 

The model considered was based on the results of the 
previous chapters and consisted of the following form: 
Promotion = f (OAFQTP, PQSCR, GTSCR. HIYRED, NCOE. RACETH, SEX 
plus interaction terms SEX*HIYRED, SEX^GTSCR, SEX’^-OAFQTP ) 
The variables OAFQT, PQSCR, and GTSCR are metric and 
continuous, HIYRED and NCOE are discrete and metric, and 
RACETH and SEX are discrete and nonmetric. 

A representation of the model using notation consisted of 
the following form: 

Yt = Bo -*• Bi Xi Ba Xa -*• Ba Xa + Da + ... D4 ■** I x ... la 

In the above notation, Yi is the promotion variable PRA, 
Bo is the linear intercept, and Bi through Ba are 
coefficients for the continuous variables OAFQT, GTSCR and 
PQSCR. The coefficients Bi through Ba are assumed to be the 
same for all levels of the other variables. Di through D 4 
represent the discrete variables RACETH, SEX, HIYRED, and 
NCOE. Ii through la are the interaction terms OAFQT+SEX, 

HIYRED*SEX, and NCOE*SEX. 

This model is also unbalanced and the F statistics are 
estimates. The results of the ANCOVA using this model are 
shown in Table XV. 
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TABLE XV ANCOVA with Interactions 








DEPENDENT VARIABLE: PRA 








SOURCE DF SSQ MEAN SQUARE F VALUE 


PR > 


F 


R2 


MODEL 55 2423.68 


44.07 47.13 1 


0.0001 




0 . 0642 


ERROR 37798 35339.29 


0.934 


ROOT 1 


MSE 


CORR 37853 37762.98 




0 . 966 




TOTAL 










SOURCE DF 


TYPE III SS F 


VALUE 




PR > F 


Main Effects 










OAFQT 1 


12 . 89440024 


13.79 




0 . 0002 


RACETH 5 


152 . 10095609 


32.54 




0.0001 


SEX 1 


5 . 31950192 


5.69 




0.0171 


HIYRED 12 


517.91751116 


46.16 




0.0001 


GTSCR 1 


3.65772995 


3.91 




0.0479 


NCOE 13 


132.83314221 


10.93 




0.0001 


PQSCR 1 


80.15632971 


85.73 




0.0001 


Interactions 










0AFQT*SEX 1 


4.03387863 


4.31 




0.0378 


SEX*HIYRED 9 


10.16825209 


1.21 




0.2844 


SEX*NC0E 11 


18.42527136 


1.79 




0.0496 




T FOR HO: PR > 1 T 


1 STD 


ERROR OF 


PARAMETER ESTIMATE 


PARAMETER=0 


ESTIMATE 


INTERCEPT 0.25501 


0.31 0.7592 


0.83191986 


OAFQT 0.00094 


1.26 0.2077 


0.00074544 


GTSCR -0.00104897 


-1.98 0.0479 


0.00053034 


PQSCR 0.00422902 


9.26 0.0001 


0.00045674 



There are three important observations from Table XV, 
First, the main effects variables, with the exception of 
GTSCR, are still significant in their ability to account for 
variance in the model. 

Secondly, no interaction terms are significant. The PR > 
F for these terms are much greater than ,0001 and each has a 
small F value. Thus, the effect of the interaction terms 
will be assumed to be negligable. 

Lastly, the bottom portion of the ANCOVA table lists 
estimates of regression coefficients for the continuous 
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variables. These estimates were tested, using the T 
statistic, to see if they were significantly different from a 
hypothesized value of zero. If the estimate was not 
significantly different from zero, then the explanatory 
variable did possess sufficient predictive ability. 

The PQSCR coefficient has a small, but positive slope 
with a value of 0.0042, and is significantly different from 
zero. The OAFQT variable has a slope with the correct sign 
and magnitude, but it is not significantly different from 
zero. The GTSCR variable demonstrates a negative slope and 
again is not significantly different from zero. 

The negative estimate value, combined with the knowledge 
that GTSCR is strongly correlated with OAFQT, indicated a 
condition of multicollinear ity between the two variables. 
Multicollinearity implies that one variable may be simply a 
surrogate for the other with little or no effect as a 
predictor .[ Ref . 15:p. 4151 Thus, the inclusion of GTSCR 
coincident to . OAFQT was considered detrimental to the 
development of a regression model, and it was dropped from 
subsequent analysis . 

In summary, ANCOVA resulted in the elimination of the 
remaining interaction terms from consideration in the 
predictive model. The estimated values of OAFQT and GTSCR 
demonstrated a condition of multicollinearity in the model, 
and the weaker variable, GTSCR, was eliminated. The 
remaining variables to be considered in subsequent analysis 
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were : 



OAFQT, PQSCR, HIYRED, NCOE 



RACETH, and SEX. These 



results were considered satisfactory, in that the remaining 
variable set contains single measures of academic aptitude, 
education, professional education, military performance 
testing, as well as two categorical variables: SEX and 
RACETH. 



3 . The Final Model: A Multiple Regression (ANCOVA) 

a . Background 

Regression analysis with a reduced set of 



variables was 


the 


final 


step 


in successive data 


analyses . 


The important 


result of 


this 


analysis was a 


set of 


coefficient values 


which 


estimated qualitative 


numerical 



statements about the independent influence of each of the 
explanatory variables. Of specific importance was the 

independent influence of OAFQT and HIYRED in predicting an 
individual promotion rate. 

In the development of the regression model this section 
will : 

1 . Review the pertinent results which led to the 
regression model definition. 

2. Compare the model using the three promotion rate 
variables . 

3. Select a single promotion variable for the model. 

4. Interpret the resulting regression estimates and 
conduct sensitivity analysis. 

5. Check model assumptions and confirm the model using 
an alternate data set and nonparametr ic procedures. 

6. Test the model by comparing actual versus predicted 
promotion rates for population subcategories. 
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Previous results are reviewed in the following paragraphs. 

ANOVA and ANCOVA demonstrated that significant 
differences exist between internal levels of the explanatory 
variables as a function of average promotion rates. 

Paired scatterplots utilizing smoothing techniques, and 
plots of the level means found in ANOVA, consistently 
demonstrated an ascending linear pattern when plotted against 
promotion variables . 

ANOVA and ANCOVA models, using interactions, resulted in 
the elimination of variables which did not demonstrate 
sufficient linear additive effect to be included in the 
model. Further, this analysis confirmed that there was no 
significant interaction among the remaining variables. 
Correlation analysis, combined with the in-depth univariate 
analysis as to the nature and scoring procedures of the 
individual variables, identified groups of variables. In 
subsequent analysis, these groups were then restricted to 
allow for only the strongest unique variable to be entered 
into the model . 

The final set of variables for entry into the model are 
the following: 

Promotion = f ( OAFQT , PQSCR , HI YRED , NCOE , RACETH , SEX ) 

This model is a mixed scale and variable type model, 
including both discrete and continuous variables. Two of the 
input variables have nominal scale, RACETH and SEX. To allow 
for their entry into the model, these values were transformed 
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into dummy variables. Specifically, the variable SEX was 
recoded as a 0/1 variable, while RACETH was represented with 
five dummy 0/1 variables: D1 through D5. For example, for 

the RACETH score of 1, the dummy variable D1 was coded with a 
1 for every 1 entry and a zero for all others. This 
procedure was applied for the next four levels, while score 6 
was left as a 0/0 entry. CRef. 15:pp. 332-341] 

After application of the recoding just described, the 
regression model can be defined with the notation: 

Yi = Bo Bi Xi Bz X2 + Bs Xa B4 X4 + Di + ... Ds + De 

In the above notation, Yt is one of the promotion 
variables. Bo is the linear intercept, and Bi and Ba are 
coefficients for the continuous variables OAFQT, and PQSCR. 
Ba and B4 are coefficients for the discrete and ordinal 
variables HIYRED and NCOE. Di through Da represent the dummy 
variables for RACETH, and Da represents the dummy variable 
for SEX. 

The data set of 37,854 records was randomly split into 
two separate data files for regression analysis. This 
provided for a different data set to confirm analysis of 
regression coefficients from the first set. Paragraph e.l. 
of this section compares resulting regression coefficients of 
the model using the second data set. 
b. Results 

Table XVI lists the regression results of the 
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basic model variables. 



When computing models for PRATE and 



RATE the effects variables CMF and then CMF and PAYGD were 
reintroduced into the set of explanatory variables 
respectively. This allowed for comparison of variable 
coefficients and R2 value changes as the dependent variable 
became more restricted. In Table XVI the top paragraph shows 
the ANOVA results of the model and reports the F and R2 
statistic. Each column then gives the regression results of 
each promotion rate model, including a Pr>T value as measure 
of the strength of rejection for a null hypothesis of zero 
for the estimate value. Values of Pr>T less than .05 are 
considered acceptable for consideration of that variable. 
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TABLE XVI Regression Results 






PRATE 


RATE 


PRA 


Added Variables 


CMF, PAYGD 


CMF 


None 


ANOVA F 


1317.4 


360.3 


218 . 5 


Pr>F 


.0001 


. 0001 


.0001 


R2 


. 3116 


.0948 


. 0546 


Intercept 


0.022222 


-1 . 03692 


-1 . 28822 


(std error) 


( .002558) 


( . 055368) 


( .05600) 


Pr>T 


.0001 


.0001 


.0001 


OAFQT 


.0001355 


. 0058817 


. 0042608 


(std error) 


(00000871) 


( .0002444) 


( .0002492) 


Pr>T 


. 0001 


. 0001 


.0001 


HIYRED 


. 0005341 


. 148352 


. 139484 


(std error) 


( .000152) 


( . 004851 ) 


( .0049298) 


Pr>T 


• 0001 


.0001 


. 0001 


PQSCR 


.000089 


.001608 


. 00327211 


(std error) 


( .000014) 


( .000449) 


( . 0004583) 


Pr>T 


• 0001 


.0001 


.0001 


SEX 


- . 0008582 


.022904 


.0564079 


(std error) 


( .00050325) 


( .01562) 


( .0155310) 


Pr>T 


.088* 


. 1427* 


.0003 


NCOE 


. 00008839 


. 012688 


. 0073740 


(std error) 


( .00000625) 


( . 0017808) 


( . 0017949) 


Pr>T 


. 1573* 


. 0001 


.0001 


D1 (RACETH) 


. 0026347 


. 053088 


.01497054 


(std error) 


( .0011286) 


( . 035653) 


( .0363905) 


Pr>T 


.0196 


.1365* 


.6808* 


D2 (RACETH) 


- .0037888 


- .096320 


-0 . 0898693 


(std error) 


( .0011266) 


( .035570) 


( .0363089) 


Pr>T 


. 0008 


. 0068 


.0013 


D3 (RACETH) 


- .0009404 


- .0239592 


- .0417668 


(std error) 


( .001279) 


( . 040383) 


( . 04122033) 


Pr>T 


.4623* 


. 5530* 


. 3109* 


D4 (RACETH) 


.00028892 


.089059 


.01007473 


(std error) 


( .0032534) 


( . 102707) 


( . 1048355) 


Pr>T 


. 3745* 


. 3859* 


.9234* 


D5 (RACETH) 


- .000224 


- .021530 


- . 0138649 


(std error) 


( .0018127) 


( .0572261) 


( .058409) 


Pr>T 


.9016* 


.7067* 


. 8124* 


CMF 


- . 000147 


- . 0053672 


NA 


(std error) 


( .0000052) 


( . 0001654) 




Pr>T 


.0001 


. 0001 




D7 (PAYGD) 


.060127 


NA 


NA 


(Std error) 


( .0017904) 






Pr>T 


. 0001 






D8 (PAYGD) 


.017999 


NA 


NA 


(std error) 


( .001774) 






Pr>T 


. 0001 
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Observations from the regression table are summarized in 
the following paragraphs. 

The input variables OAFQT, HIYRED/ and PQSCR all 
maintained a positive and statistically significant 

coefficient value across all three dependent variables. 

The inclusion of PAYGD with the PRATE variable 
significantly increased the R2 value of the model. 

Conversely, the influence of OAFQT, HIYRED, PQSCR, and the 
other explanatory variables was severely diminished. 

The RATE model is very similar to the PRA model, and has 
generally larger estimate values and a higher R2 . However, 
the estimates for RACETH and SEX did not have significant T 
values . 

The PRA model, although having a lower R2 value and 
generally smaller estimate values, had an acceptable T test 
result for SEX. Additionally, the PRA model contained one 
less nominal explanatory variable, CMF. The PRA model then, 
has fewer, and more reliable nominal explanatory variables. 
Since the objective of the study was to focus on academic and 
educational measures as predictors of promotion, the PRA 
model was chosen as the most effective predictive model. 
Subsequent analysis of regression coefficient results were 
conducted with the PRA model, 
c . I nterpretation 

Interpretation of the regression coefficients 
will include two points. First, the explanatory variables 
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which can effect the greatest change in the dependent 
variable will be identified. Secondly^ an example will 
demonstrate the amount of change in a given explanatory 
variable required to achieve a five percent shift in the PRA 
estimate . 

The amount of change in PRA caused by a change of one unit 
of an explanatory variable can be read directly from the 
regression coefficients. However^ the total amount of change 
that an explanatory variable can cause in PRA depends on the 
range of the explanatory variable. Table XVII gives an 
ordered listing of the explanatory variables, excluding 
categorical variables, from most to least total influence as 
measured by Net Possible Change. The net possible change is 
simply the number of units in the range of the explanatory 
variable multiplied by the coefficient estimate. 



TABLE XVII 


Net Possible 


Change by Explanatory Variable 


Variable 


Ranqe 


Estimate Net 


Possible Chanqe 


HIYRED 


1-12 


. 13948378 


1.6738 


OAFQT 


1-99 


. 00426083 


0.4218 


PQSCR 


21-100 


.00327212 


0.2585 


NCOE 


0-14 


.00737408 


0.1106 



In a qualitative sense, the sensitivity of PRA to each 
explanatory variable can be demonstrated by deriving the 
number of explanatory variable units needed to move from the 
median PRA value up five percent. 

To compute the average value for PRA, the population 
average for each explanatory variable was entered into the 
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regression model. The resulting PRA value was 0.0185/ which/ 
using the normal approximation/ lies at the 50.7 percentile 
of the PRA distribution. An upward shift of 5 percent would 
then require the PRA value to lie at the 55,7 percentile. 
Using the standard normal tables to approximate the PRA 
distribution/ the PRA value corresponding to its 55.7 
percentile was 0,1434. Checking the sensitivity of each 
explanatory variable consisted of changing a single 
explanatory variable a sufficient number of units to result 
in a PRA value of 0.1434/ while holding all other explanatory 
variables at the population average. Table XVIII tabulates 
the increase of explanatory variable units necessary to 
produce a 5 percent upward shift in PRA percentile. 
Alternatively/ if the amount required to reach the 55,7 
percentile was not possible within the range of the input 
variable/ the maximum amount of available change was listed. 



TABLE XVIII 
Variable 


Sensitivity of 
Averaqe Value 


PRA to Explanatory Variables 
Chanoe to Pra % Chanae 


HIYRED 


6.01 


7.0 


55.9 


OAFQT 


45 . 3 


74.0 


55.7 


NODE 


3.06 


14.0* 


54.0 


PQSCR 


78.4 


99.0* 


53.4 


*max value 









Interpretation of the coefficient values clearly 
demonstates that HIYRED is the most important explanatory 
variable. This observation is understandable since the 
structure of the variable is discrete/ and that changes to 
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RLUnVE FREQUENCY 
0.05 0.10 0.15 0.20 



adjacent values represents major distinctions in educational 
background. The example of shifting from a value of six to a 
value of seven^ represents the difference of having a high 
school degree versus having gone to one year of college. In 
percentages of HIYRED^ that constitutes moving from a large 
center group of high school qualified NCO's, to the upper 
ninety percent of the HIYRED distribution. 

OAFQT is the second most significant explanatory variable. 
A shift of roughly one quarter of its range, i.e. 45 to 75, 
can change PRA plus or minus five percent. The other 
explanatory variables NCOE and PQSCR have considerably less 
influence on the dependent variable. 

d. Checking of Assumptions 

To verify the requirements for the regression 
model, residual analylsis was performed using the Grafstat 
program. Representative plots of the OAFQT residual are 
shown in Figures 4.18 and 4.19. 

REGRESSION REDiSUAL HISTOGRAM REGRESSION RESIDUAL SCATTER PLOT 

(N=5C0) 





rea 



40 60 

OAFQTP 
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Figure 4.18 



Figure 4.19 
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The histogram of residuals, shown in Figure 4.18, 
demonstrates that the residual distribution is approximately 
normal. Homoscedast icity is checked in Figure 4.19, in which 
residuals have been plotted against the OAFQT variable. 
There does not appear to be any patterns in the plots of the 
residuals, and the uniform pattern was considered sufficient 
to justify the assumption of homoscedastici ty . Lastly, since 
each observation represents a different person, the 
independence of each observation from one another is assumed 
true . 

e. Confirmation of Regression Findings 

(1) Second Data Set . Regression analysis was 
conducted on the second partition of the data set. A 
comparison of those results with the first data set is shown 
in Table XIX. 



TABLE XIX Comparison of 


Regression 


Data Sets 




Independent Variable 


PRA 






1st Set 


2nd 


Set 




Coeff Std Err 


Coeff 


Std Err 


Estimator 






OAFQT 


.004260 (.00025) 


. 004729 


( .00032) 


HIYRED 


.139483 (.00493) 


. 131559 


( .00636) 


PQSCR 


.003272 (.00046) 


. 003197 


( .00060) 



The above results are felt to be sufficiently comparable 
to accept the original model coefficient scores. 

(2) Nonparametric Regression . Since the model 
contained an ordinal variable, HIYRED, a regression result 
using nonparametric terms was included as a confirmatory 
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measure. Nonparametric regression produced the same linear 
least squares approximation for the model estimates, so the 
regression coefficient for HIYRED was still 0.1395. However, 
for nonparametric regression the test for the acceptance of 
the estimate value used the Spearman rank correlation 
coefficient. The regression coefficient for HIYRED was 
tested using this procedure. 

First, for each value of PRA and HIYRED a predicted value 
U was found by computing U = PRA - (0.1395 * HIYRED). Then, 
the Spearman rank correlation coefficient, rho, was computed, 
based on the ranks of HIYRED and the ranks of U. It was 
found to be 0.02482 with a Pr> I R I of 0.0001. In this test 
the null hypothesis was the value of the regression 
coefficient was equal to 0.1395, the value found in 
regression. [Ref. 13:pp. 265-271] To test the null 
hypothesis, that the regression coefficient estimate is 
correct, rho was compared against a rejection region computed 
using the two tailed Spearman Quantile, with a normal 
approximation. The rejection regions for this Spearman 
Correlation parameter were values less than 0.0085 or greater 
than 0.9915. Since the value of rho did not fall inside 
either rejection region, the null hypothesis could not be 
rejected, and a HIYRED regression coefficient of .1395 was 
acceptable . 
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f. Testing the Model 

The model coefficients found by regression were 
tested in two ways. First, a predicted promotion rate value 
was computed for the extremes and average of the model . The 
extreme values used the minimum or maximum values for the 
input variables . The average promotion rate was computed 
using sample averages for all input variables. The resulting 
predictions were then be compared against the actual 
distribution percentiles . 

Secondly, subsets of the sample population had average 
promotion rates predicted using categorical values and sample 
population averages. The resulting predictions are compared 
against the actual sample values. Again percentile values 
for PRA were found by using a standard normal table 
approximation . 



TABLE XX Comparison of Extreme and Average Predictions 



Model 

Minimum Prediction 


Data 

Samole Percentile 


PRA Value 
-1.0009 
( . 1000) 


Percentile 

15.7% 

(3.5%) 


PRA Value 
-1.558 


Percentile 

5% 


Maximum 


Prediction 


Sample 


Percentile 


PRA Value 
1.23029 
( .4098) 


Percentile 
89 . 1% 
(9.9%) 


PRA Value 
1.7866 


Percentile 

95% 


Average 


Prediction 


Sample 


Percentile 


PRA Value 

0.01839 

(0.223) 


Percentile 

50.7% 

(8.5%) 


PRA Value 
-0.04146 


Percentile 

50% 
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The model predictions were very accurate at the average 
level, but this accuracy diminished at the extremes • 

The second test for the model was one where specific 
population subcategor ies had their average PR A value 
predicted. The subcategories represented were four 
combinations of SEX and the black and white RACETH variables. 
Additionally, predictions were made to check the average 
promotion rate of all NCO's with a HIYRED value of 10, and 
all NCO's with an OAFQT of 85. As in the previous table, 
unless the input variable is being used as a subcategory, its 
value was set to the overall population average. Table XXI 
shows the results of the predictions. 



TABLE XXI Comparison of Predicted 


VS Actual 


PRA Averages 


Subcateoorv 


Predicted % 

( Lower-Upper) 


Sample % 


Sample Size 


Male/White 


55 . 1 

(45.7-64.2) 


53 . 1 


18,003 


Male/Black 


49.5 

(40.3-58.9) 


44.3 


12,121 


Female/Black 


47.3 

(37.7-56.1) 


47.7 


2,485 


Female/White 


52.9 

(44.1-61.5) 


59.5 


1 , 842 


HIYRED=10 


71 .7 

(63.5-79.3) 


75.7 


969 


0AFQT=85* 


57.4 

(44.7-69.4) 


60 . 2 


2129 


*The sample data 
range of OAFQT 80 


point estimate 
to 90 . 


was averaged over a 
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Testing of the regression model indicates that it was 

reasonably effective if used with input changes of the 

nominal variables, such as SEX and RACETH. Changes in the 
value of HIYRED produces reliable estimates, and demonstrated 
the considerable contribution of this variable as a predictor 
of PRA, The continuous variable OAFQT is difficult to test; 
since it is a continuous variable the model estimate was 
taken over a range of values. Predicted results are close to 
the sample value, but the variance of the estimate still 
spans the median. OAFQT does move the predicted values of 
PRA in the right direction, but its ef f ecti veness is severely 
hampered by its variance and diminishing ability to provide 
an accurate prediction value as PRA approaches either 
extreme. Other prediction estimates were attempted using 

OAFQT and their results demonstrated the same lack of 

predictive ability away from the center percentiles, 
g. Summary of Regression Analysis 

Regression analysis provided estimates of the 
independent contribution of several key variables to 
predicting a promotion rate. They include a measure of 

intellgence aptitude, OAFQTP, a measure of academic ability, 
HIYRED, two measures of military performance, PQSCR and NCOE, 
and two nominal values SEX and RACETH. 

Testing of these estimates shows that the predictive 
ability of the model is limited to those variables which have 
very distinct abilities to subcategorize the sample 
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population. These variables are the SEX, RACETH, and HIYRED 
variables. The continuous variables for OAFQT, PQSCR, cannot 
be relied upon to independently yield estimates of PRA, but 
can affect limited shifts of the PRA distribution within a 
subcategory . 

E. SUMMARY OF FINDINGS 

Chapter IV was the principal analytical exercise in this 
study. It progressed through ascending stages of analysis 
and resulted in an inferential model with a restricted and 
independent set of explanatory variables. These explanatory 
variables did, in fact, rely on levels of intellegence tests 
and academic background as values to predict promotion. 

The model, however, demonstrated only limited utility as a 
preditive equation. It could only match the sample data when 
it was describing an average promotion rate among a large 
population subcategory. This would occur only where the 
change in the explanatory variable had a significant 
partitioning effect on the population. 

The next two chapters will investigate the relationship of 
intelligence and academic ability as a predictor of promotion 
rate but through different procedures. 
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V. 



ANALYSIS OF TOP PERFORMERS 



A- INTRODUCTION 

This chapter took an ad hoc approach to identify any 
trends which distinguish top performers, on the basis of 
promotion rate, from their peers. Top performers consist of 
the top three percent of the population, or 1,047 
individuals, according to PRA scores. This data set was 
referred to as the TOP data set, while the remainder were 
referred to as the SAMPLE data set. 

Analysis consists of three sections. The first section 
is a comparative tabulation of means and variances. Results 
shown in this section confirmed the majority of sample 
characteristics predicted in Chapter IV., such as higher 
EIMCAT and OAFQT scores. There were, however, discrepancies 
with respect to TOP distribution values of RACETH, NCOE and 
PAYGD. Those discrepancies are investigated in later 
sections of this chapter. The second section reports the 
results of formal hypothesis testing for differences in means 
between each of the explanatory variables. The last section 
investigates the discrepancies associated with RACETH, NCOE, 
and PAYGD. Through a presentation of graphics demonstrating 
internal shifts of those variable distributions, an effect 
which appears to interrelate the three distributional 
discrepancies is identified. 
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B. COMPARISON OF MEANS AND VARIANCE 



The "tabulated means and variances of the study variables 
for the top three percent and for the remainder of the entire 
sample are presented in Table XXII. The last column in the 
table shows the percentage and direction that the TOP data 
set differed from the SAMPLE. 



TABLE XXII 


Top vs 


Sample ; 


Summary Data 




Variable/Tvpe 


Tod 3% 


Sample 


Comment 


Promotion 


Mean 


Std Dev 


Mean 


Std Dev 






RATE 


2.06 


. 392 


0.00 


1.00 






PRATE 


. 178 


. 037 


. 109 


.036 






PRA 


2 .33 


. 350 


0.00 


1.00 






Intelliqence 












AFQTP 


64.69 


22.01 


53.4 


20.9 


Top 17.5% 


> 


OAFQTP 


61.60 


23.24 


45.3 


24.7 


Top 26.4% 


> 


EIMCAT 


6.11 


1 . 31 


5 . 07 


1 . 28 


Top 17.0% 


> 


GTSCR 


113.17 


14.70 


108 . 3 


14.2 


Top 4.1% 


> 


HIYRED 


6 . 88 


1.59 


6 . 01 


1 . 07 


Top 12.6% 


> 


EDLVL 


7 . 12 


1.55 


6.32 


. 97 


Top 11.2% 


> 


PQSCR 


80 . 57 


11.31 


78.4 


1.6 


Top 2.6% 


> 


NCOE 


2 .31 


2.50 


3 . 06 


2.81 


Top 33% 


< 


Effects 














SEX 


1 . 18 


.390 


1 . 12 


. 328 


Top 5% 


> 


CMF 


62.09 


27.146 


51 . 9 


31.3 


Top 16% 


> 


RACETH 


1.58 


.975 


1 . 65 


.942 


Top 4% 


< 


PAYGD 


5 . 19 


.405 


5.27 


.464 


Top 3% 


< 



Observations derived from the data in Table XXII can be 
summarized as follows: 

The four aptitude test variables, GTSCR, AFQTP OAFQTP and 
EIMCAT, all demonstrate a strong positive difference between 
the TOP and SAMPLE scores. The AFQT related scores are about 
twenty percent greater, with GTSCR greater by four percent. 
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The variables^ EDLVL and HIYRED, were both positive^ with 
HIYRED slightly larger at twelve percent^ PQSCR increased 
slightly . 

The effects variables SEX and CMF both increased^ with 
CMF demonstrating a significant increase. The change in CMF 
was an unexpected result of subsetting to the top three 
percent. The PRA variable was designed to be independent of 
CMF^ and it should not have been affected as significantly as 
it was . 

The only variables which decreased in proportion between 
SAMPLE and TOP were NCOE, RACETH, and PAYGD. Of the three, 
NCOE was the largest. The change in NCOE was also an 
unexpected result. Regression analysis indicated that NCOE 
had a positive influence on PRA. To have NCOE decrease with 
top performers is the reverse result. Paragraph D of this 
section will attempt to explain the reason for this anomaly. 

C. SIGNIFICANCE TESTING 

Significance testing for means of the explanatory 
variables between the TOP and SAMPLE data set was included as 
a formal statistical confirmation of differences between the 
two data sets. Testing using nonparametr ic methods was 
utilized since the study variables were either discrete, or 
if continuous, did not meet the Kolmogorov-Smirnov one-sample 
test for a normal distribution. The type of nonparametr ic 
test used is dependent on the type scale of the variable and 
whether it was continuous or discrete. 
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TABLE XXIII Top vs Sample 


Hypothesis Results 


Variable Test Used 






Results 


Intelligence 










GTSCR 


Kruskal-Wallis Test ^ 


Chisq 




671 


Strongly 
reject HO: 


AFQTP 


Kruskal-Wallis Test 


Chisq 




1165 


Strongly 
reject HO: 


OAFQTP 


Kruskal-Wallis Test 


Chisq 


— 


1418 


Strongly 
reject HO: 


EIMCAT 


2XC Contingency Table* 


Chisq 




503 


Strongly 
reject HO: 


HIYRED 


2XC Contingency Table 


Chisq 


- 


931 


Strongly 
reject HO: 


EDLVL 


2XC Contingency Table 


Chisq 




700 


Strongly 
reject HO: 


POSCR 


Kruskal-Wallis Test 


Chisq 


= 


26.1 


Reject HO: 


NCOE 


2 X C Contingency Table 










Effects 










SEX 


2 * C Contingency Table 


Chisq 


= 






CMF 


2 * C Contingency Table 


Chisq 


— 




Strongly 
reject HO: 


RACETH 


2 * C Contingency Table 


Chisq 


= 




Reject HO: 


PAYGD 


2 * C Contingency Table 


Chisq 






Strongly 
reject HO: 



^ For this nonparametric test the null hypothesis is that 
the populations are identical. The alternate hypothesis is 
that one of the populations yields larger observations. With 
two populations this is equivalent to a Mann-Whitney test. 
At a level « of .95 the critical Chisquare value for 
rejection is Chisq > 3.84. 

2For this nonparametric test the null hypothesis is that 
the two populations have the same distribution as measured by 
the probability of falling into one of the discrete variable 
classifications. The alternate hypothesis is that the 
distributions are different. The contingency table is set 
for the two rows to be the classif ication of PRA > 1.93 and 
PRA < 1.93^ the C represents the number of discrete levels in 
the variable being tested. The Chisquare test statistic is 
also used for this test with a rejection of HO: when Chisq is 
larger than 3.84 at a .95 level a. 
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Hypothesis testing confirms the observations made on 
simple means and variances of the study variables. The 
strength of the difference can be interpretated by the 
magnitude of the Chi-square statistic. 

D. ANALYSIS OF DISTRIBUTIONS 

This section further investigates the shifts in 
distributions for those variables which conflicted with the 
relationships derived in regression and correlation analysis. 
Those variables were CMF, NCOE and PAYGD. Again, the 
conflicts which arose were two-fold. 

First, neither CMF or PAYGD should have been affected by 
subsetting of the PRA variable. The PRA scores are normalized 
differences from the average score for every paygrade and CMF 
combination. Assuming a uniform application of promotion 
policy then, no one CMF or paygrade should have dominated as 
a result of subsetting to the top three percent. Secondly, 
NCOE should have increased slightly rather than decreased 
significantly by subsetting to the top three percent. 

The three inconsistencies appear to be linked in their 
distributional change. Observation of the three Figures 5.1, 
5.2, and 5.3. demonstrate this. 
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TOP VERSUS SAMPLE CMF CHANGES IN PERCENT 




Figure 5 . 1 

Figure 5.1 demonstrates a clearly defined redistribution of 
CMF percentages away from combat arms MOS ' s to the combat 
service support MOS's. In particular Infantry^ Artillery, 
and Armor MOS's lost a total of 15.5 percent, while the 
Admi ni strative Specialists (CMF 71) gained almost 9 percent. 

TOP VS SAMPLE NCOE 




Figure 5 . 2 

Figure 5.2 demonstrates transfer of a large percentage of 
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the sample density away from the NCOE 7 to the NCOE 0 level. 
This was consistent with the observations in Figure 5.1^ 
since only combat arms NCO's qualify for level 7 , the Combat 
Arms Primary Leadership course. 

TOP VS SAMPLE PAYGD 




Figure 5.3 

The last figure. Figure 5.3, shows a displacement of 
percentage fi'om the E^6 to the E-5 paygrade as a result of 
extracting only the top three percent by measure of promotion 
rate . 

To offer an explanation of the underlying reason for 
these discrepancies is difficult. Some measure of this 
discrepancy may well be explained in that the removal of 
effects by normalizing the PRA scores was not entirely 
adequate. The observed discrepancy may be simple 
mathematical error. However, it can be noted that their 
interrelationships do act consistently. Specifically, the 
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reduction in paygrade and combat MOS ' s both combine to 
significantly reduce the NCOE level. As such, it is more 
likely that change in NCOE occured coincident with the 
changes in the two variables PAYGD and CMF. The effect being 
demonstrated was one where junior combat service support 
NCO's were dominating promotion achievement. 



E. SUMMARY OF FINDINGS 

Comparing the changes in averages for the top performers 
to the regression coefficients found in Chapter IV, shows 
very substantial agreement. Specifically, OAFQT was the most 
significant intelligence test variable, while HIYRED was the 
most significant academic variable. Although the percent 
change in OAFQT is greater than HIYRED, it still has 
considerably more variance than HIYRED. Thus, the predictive 
ability of HIYRED in regression should be more pronounced 
than that of OAFQTP . The less significant variables of 
PQSCR, SEX, and RACETH each shifted a small, significant 
amount in the appropriate direction. 

The only discrepancy between the two procedures is the 
change in the variable NCOE. This change is felt to have 
been induced by changes in the CMF and PAYGD distributions . 
The effect is one where junior combat service support NCO's 
replace NCO's from the combat MOS's. 

An important observation from analysis of the top three 
percent was that the increase in the value of any explanatory 
variable was not extreme. In fact, the largest increase was 
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only twenty-five percent. As an inference, it appears that 



NCO's who do a little better in a combination of 
rather than much better in a single area, are more 
recipients of faster promotion rates. 



areas , 
likely 
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VI . PRINCIPAL COMPONENTS AND FACTOR ANALYSIS 



A. INTRODUCTION 

In this chapter more advanced statistical procedures are 
implemented to better summarize the independent variables ^ 
and improve or at least simplify the cause-effect model. 
Principal components and factor analysis are two closely 
related procedures which are normally used in investigating 
the mutual relationships and communali ties of a large number 
of variables. By identifying redundant variables, and by 
constructing composite variables of the originals, it is 
possible to reduce the number of independent explanatory 
variables to only those which are significant and unique. 

B. THEORY 

Principal components and factor analysis each use matrix 
algebra to operate on a P by P matrix of correlation or 
covariance coefficients and produce a system of eigenvectors 
of the form: 

Y( j ) = ai j Xj + ai j X 2 -»■ ..apjXp + E. In the notation, Y<j) 
represents the resultant composite variable which is the 
linear combination of the loading coefficients, at j . These 
loading coefficients multiply each of the original variables 
Xo , n=l..p. E represents the amount of residual error not 
accounted by the linear model. [Ref. 5:p. 328] The 
resulting eigenvectors represent a set of orthogonal 
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components jointly perpendicular in the space of the original 
variables. [Ref. 15;p. 4243 These components are jointly 
uncorrelated and individually account for levels of variance, 
where the first principal component accounts for the largest 
proportion, and the last principal component accounts for the 
smallest. A resulting component may be representative of 
some aggregate characteristic of the original input 
variables. For example a resulting eigenvector which has 
strong factor loadings for original variables of physical 
strength and endurance could be called a factor of stamina as 
an aggregate measure. Principal components and factor 
analysis differ in that principal components assume and 
require that number of components equal to the number of 
initial variables is needed to account for the total 
variance. In contrast, the factor method assumes that there 
exists a set of composites in a dimension smaller than the 
dimension of the original number of variables which will 
suf f ice . C Ref . 5:p. 6223 

An additional aspect of factor analysis is that it allows 
for rotation of the solution with the intent of developing 
more unique and well-defined components. For example if 
there are five variables in a factor which have intermediate 
loading factors in the range . 2 to .4, a rotation of common 
factors by applying nonsingular linear transformations may 
result in a pattern matrix in which the loadings are either 
zero or close to one. The end result is ea ier to interpret 
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than the factor with numerous mixed elements. Graphical 
measures are useful with the rotation procedure and allow the 
analyst to see the relative uniqueness of the input 
variables . 

C, RESULTS 

The SAS procedure for performing factor analysis was used 
with the method of factor determination being the principal 
component method. As such, basic principal component 
analysis was conducted, but limits were applied on the number 
of factors retained so that only the most significant 
composite factors would be kept. The first set of input 
variables included all of the twelve study variables. Table 
XXIV shows the resulting factor solution. Appended below 
each component is an interpretation explaining what the 
aggregate factors represent. The original input variables 
which contributed most to the factor have been underlined. 
Following Table XXIII is a factor plot. Figure 6,1, where 
each of the variables is coded by a letter. By observing the 
plot, any lack of uniqueness for a group of variables can be 
noted where the coded letters are close to one another. 
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TABLE XXIV Principal Components Tabular 


Results 




Input 


Matrix 


of correlation coefficients 




PRIOR COMMUNALITY ESTIMATES: ONE 








1 


2 3 


4 


5 


6 


7 


EIGENVALUE 4. 


0052 1. 


7334 1.4979 1.0634 


0.8496 


0.8028 


0.7542 


DIFFERENCE 2. 


2717 0. 


2355 0.4344 0.2138 


0.0468 


0.0486 


0.2149 


PROPORTION 0. 


3338 0. 


1445 0.1248 0.0886 


0.0708 


0.0669 


0.0628 


CUMULATIVE 0. 


3338 0. 


4782 0.6031 0.6910 


0.7625 


0.8294 


0 . 8922 


8 




9 


10 11 


12 




EIGENVALUE 0. 


5392 0 


.3500 0. 


2809 0.1196 0. 


0034 




DIFFERENCE 0. 


1892 0 


.0690 0. 


1613 0.1161 






PROPORTION 0. 


0449 0 


.0292 0. 


0234 0.0100 0. 


0003 




CUMULATIVE 0. 


9372 0 


.9663 0. 


9897 0.9997 1. 


0000 




7 FACTORS WILL BE RETAINED BY THE NFACTOR CRITERION 








FACTOR 


PATTERN 








FACTl 


FACT2 


FACT3 


FACT4 


FACT5 


FACT6 


FACT7 


EDLVL .4302 


.5861 


. 5024 


-.2544 - 


. 0624 


- .0693 


- . 029 


AFQTP .9515 


- .1133 


- . 1195 


.0637 - 


.0075 


. 1548 


- .024 


EIMCAT .9060 


- . 1220 


- . 1652 


-.0598 - 


. 0096 


. 1478 


. Oil 


NCOE -.0085 


- .4507 


. 6668 


.2527 - 


. 0398 


. 0084 


- . 134 


HIYRED .3834 


. 6410 


.4176 


-.3281 - 


.0637 


- . 0830 


- . 124 


SEX .1735 


. 4212 


-.1113 


. 6516 


. 1857 


- . 0736 


- . 550 


OAFQT .9518 


- . 1046 


- .1156 


.0590 - 


. 0092 


. 1535 


- . 023 


GTSCR .8238 


- . 1128 


.0090 


.0331 - 


. 0464 


. 1350 


. 132 


PQSCR .4001 


-.2413 


. 1205 


-.1150 - 


.7312 


- .4527 


.115 


CMF .1677 


. 5200 


- . 1449 


.4985 - 


.1171 


- . 2587 


. 561 


PAYGD .1216 


- . 3467 


.6770 


.3367 - 


. 1816 


- . 0495 


. 151 


RACETH- .3590 


. 3130 


. 2547 


. 1229 


. 4708 


. 6507 


.216 


Intell 


Acad 


Career 


Sex 


PQSCR 


RACE 


CMF 


Tests 




Status 










FINAL COMMUNALITY ESTIMATES: TOTAL 


= 10 


. 706622 
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PLOT OF FACTOR PATTERN FOR FACTORl 

FACTOR 1 



AND FACTORS 



B 

C G 



JF 



. 9 -. 8 -. 7 -. 6 -. 5 -. 4 -. 3 -, 2-. 1 



K 

D7 



9 
H 
7 
6 

5 A 

4 1 E 

3 
2 
1 

0 .1 .2 .3 .4 .5 
- .1 
- . 2 

- .3 L 

- .4 

- . 5 

- .6 

- .7 

- .8 
- . 9 
-1 

EDLVL=A AFQTP=B EIMCAT=C NCOE=D HIYRED=E SEX=F 
OAFQT=G GTSCR=H PQSCR=I CMF=J PAYGD=K RACETH=L 



F 
A 
C 
. 9 
T 
0 
R 
3 



Figure 6 . 1 

The results appear to quite reasonable, where the most 
significant factor is a composite of all the mental aptitude 
measures: OAFQTP, AFQTP GTSCR, and EIMCAT. The second 
factor consists primarily of academic performance measures 
EDLVL and HIYRED. The third factor is composed of NCOE and 
PAYGD and reflects two closely related measures dominated by 
paygrade. The fourth factor is predominantly a measure of 
SEX and two other nominal variables, CMF and PAYGD. The 
fifth, sixth and seventh factors all appear to be dominated 
by single variables, PQSCR, RACE, and CMF respectively. 
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In short, each of the original twelve variables is in 
some measure represented in the five factors, the first five 
factors accounting for over seventy five percent of the 
variance. By observing the entry for PROPORTION one can see 
that the subsequent seven factors each contributed between 
.0668 to .0028 of the variance and as such are not major 
contributors . 

Using the results of the first solution a second analysis 
was conducted with a reduced number of input variables. In 
each of the initial solution factors the single variable 
having the largest loading factor was selected and the other 
related variables were eliminated. Table XXI shows the 
results of that solution, and Figure 6.2 shows the Factor 
Plot . 
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TABLE XXV Reduced Principal Components Tabular Results 


PRIOR COMMUNALITY ESTIMATES: ONE 




Input Matrix of correlation coefficients 




1 2 


3 


4 5 


6 7 


EIGENVALUE 2.1666 1.2063 


1 . 0019 


0.8703 0.8049 0 


.7081 0.2416 


DIFFERENCE 0.9602 0.2044 


0 . 1315 


0 . 06540 . 09670 . 


4665 


PROPORTION 0.3095 0.1723 


0 . 1431 


0.1243 0.1150 


0.10120.0345 


CUMULATIVE 0.3095 0.4819 


0 . 6250 


0.7493 0.8643 


0 . 96551 . 0000 


7 FACTORS WILL BE RETAINED 


BY THE NFACTOR 


CRITERION 




FACTOR 


PATTERN 




FACTl FACT2 FACT3 FACT4 FACTS FACT6 FACT7 


NCOE .0221 -.5422 


. 6941 


.2656 -.3801 


-.1071 .018 


HIYRED .3659 .5302 


. 3135 


-.5162 -.2443 


-.4001 -.004 


SEX .1803 .6532 


. 1514 


.6993 .0899 


-.1346 -.051 


OAFQT .8945 .0404 - 


■ . 0412 


.0502 -.0668 


.2462 -.328 


GTSCR .8592 -.0374 


.0154 


-.0492 -.1259 


.3664 -.328 


PQSCR .5069 -.3707 


.2537 


-.0613 .7141 


-.2648 -.022 


RACETH -.4521 .3275 


. 5799 


-.1589 .2487 


.5031 .037 


Intell Acad 

Tests 


NCOE 


SEX PQSCR 


Race 


FINAL COMMUNALITY 


ESTIMATES: TOTAL = 7 


. 000000 


NCOE HIYRED SEX 


NOAFQT 


GTSCR PQSCR 


RACETH 


1.0000 1.0000 1.0000 


1.0000 


1.0000 1.0000 1.0000 
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PLOT OF FACTOR PATTERN FOR FACTORl AND FACTOR2 

FACTORl 

1 

E. 9D 

.8 

.7 

.6 

F .5 

.4 B 



. 3 F 

.2 C A 

. 1 C 

•9-.8-.7-.6A.5-.4-.3-.2-.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 T 

-.1 0 

-.2 R 

- . 3 2 



- .2 

- .4 G 
-.5 

- . 6 

- .7 

- .8 
- . 9 
-1 

NCOE=A HIYRED=B SEX=C OAFQT=D GTSCR=E PQSCR=F RACETH=G 



Figure 6.2 Factor Plot 

Restricting the input to the strongest unique variables 
results in an almost complete separation into single factors. 
The only exception is the grouping of GTSCR and OAFQT, (E and 
D). This is not suprising considering the composition of 
both scores from the same set of tests in the ASVAB. Thus, 
the decision to eliminate GTSCR from earlier regression 
models makes sense from the Factor Analysis perspective as 
well . 
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E. SUMMARY OF FINDINGS 

The application of principal components and factor 
analysis confirmed many of the patterns of dependency and 
redundancy with the study variables. It confirmed the 
choices for unique variables in the regression as developed 
in Chapter IV, and gave a good second opinion for deciding 
which variables could be set aside with little effect on the 
model . 
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VII . 



CONCLUSION 



A. OVERALL FINDINGS 

There is strong statistical evidence to support the 
proposition that success in the Army, as measured by 
promotion rate ^ is related to the individual's intelligence 
test scores and previous academic background. The 
explanatory variables of the 1980 normed AFQT score and the 
individual's highest year of education at time of entry are 
the most important indicators for a future promotion rate. 
The highest year of education at time of entry is the more 
important measure, but changes in its discrete scale 
represents very substantial changes in academic background. 
OAFQT is not nearly as important as HIYRED and can 
independently affect the predicted promotion rate only up to 
ten percent. 

While in service, how well the individual scores on his 
Performance Qualification Test Scores and his attendance at 
NCO schooling will be indicative of a faster promotion rate. 

The statistical evidence for these observations can be 
argued by showing the existence of significantly increasing 
promotion rate averages across ascending levels of 
explanatory measures in ANOVA and ANCOVA analysis. This 
argument can be supplemented, and those differences seen more 
concretely, by a simpler comparison of top performers verses 
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the sample averages. 

Considerable variance of promotion rate exists across any 
of the levels of the discrete explanatory variables, and 
within any of the categorical variables. There is a dilemma 
in designing an effective dependent variable. While 
controlling categorical variables such as CMF and Paygrade, 
the effects of the other variables become more apparent and 
significant. However, the ability of the model to explain 
variance is significantly diminished. 

Selecting a set of the most important and unique 
explanatory variables was achieved via two methods. A 
successive, increasing dimension procedure distilled a set of 
unique explanatory variables. This method relied upon 
developing detailed familiarity with each variable. In the 
process hypothesis testing was used to eliminate 
insignif icant contributors and identify the most important 
variable from a group of related variables. This restricted 
set of explanatory variables was confirmed with the use of 
principal components, a method which uses a mathematical 
approach to identify orthogonal and unique variables. 

When using inferential procedures the resulting model 
met regression assumptions, both parametrically and 
nonparametrically . Further, the model estimates are 
reproducable with an alternate data set. 

Although the model is technically acceptable, it is only 
accurate in predicting promotion values for population 
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subcategories* The low R2 value and high mean square error 
terms found during regression were manifested in model 
testing. When making predictions based on incremental 
changes in AFQT the sample data values were close, but upper 
and lower bounds were so large that resulting predictions 
were not usef ull . 

The poor performance of the predictive model can be 
attributed to two possible reasons. First, that there exists 
some unspecified predictor variable which could be used to 
better account for variance. Or secondly, there exists 
significant inexplicable chance in the occurance of a 
promotion rate for any given individual. 

In the case of the first reason, it should be observed 
that the number of available entries held on a given 
individual at either DMDC or MILPERCEN is limited. Of the 
one hundred and forty data fields, this study considered all 
entries which were felt to have potential merit as an 
explanatory variable. This included several versions 
expressing the same fundamental quality. Of the twelve 
variables considered the final number of significant 
variables was reduced to only six. Overall, there are few 
significant and unique measures available to use as 
predictors. To discover additional explanatory variables 
would require establishment of new personnel data elements in 
those data bases. Pot ntial candidates include evaluation 
report averages, or pc sibly, the results of a personality 
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composite test. Alternatively, the quality of information on 
academic performance could be increased, such as the 
inclusion of grade averages from high school attendance 
periods. The utility of this additional data would then have 
to be evaluated in a manner similar to this thesis. 

The second reason given for error is a more probable 
explanation, for the subject matter of this study is people, 
and not a more deterministic physical phenomenon. The 
resolution of a cause effect relationship is more subtle and 
more difficult to verify. Although this condition does not 
have a mathematical remedy, the judgement of whether or not 
even a small, highly variable measure of trend is sufficient 
still lies with the analyst and his ability to present that 
judgement to decision makers. 

B . POLICY RECOMMENDATIONS 

The first question that must be answered in this section 
is whether or not having a predictive model is necessary to 
make policy decisions regarding promotion or accession. The 
answer offered in this document is that it is not. There is 
sufficiently reliable information resulting from hypothesis 
testing and subpopulation analysis to make cogent 
observations and decisions with. 

From the results of this investigation, accession policy 
makers should closely manage the two attributes of OAFQT and 
HIYRED. This recommendation is more a confirmation, rather 
than a proposal. The 1984 Defense Authorization Act already 



123 



places constraints on AFQT category and high school diploma 
status . 

The two in-service attributes that should be managed are 
the Performance Qualification Score, and attendance at NCO 
schooling. To directly tie scores on these attributes in the 
form of promotion points or a minimum threshold scale would 
be one approach. Unfortunately, this may artificially force 
NCO's of less potential and aggressiveness into categories 
with the more competent individuals. The result may be a 
lessening of the discriminatory effectiveness of the two 
measures . 

If the individual were allowed to achieve his or her 
score and pursue in-service education independent of 
promotion policy, the ability of these variables to 
discriminate would be better. However, not tying these 
scores directly to promotion points values or thresholds 
should not mean that either measure would be unused. A 
policy where promotion boards were still instructed to review 
an individual's scores, inclusive with notification of this 
review policy to the NCO population allows for self selection 
by the more ambitious individuals. 

C. SUGGESTIONS FOR FURTHER RESEARCH 

One disturbing observation of this study was the apparent 
disparity among race and ethnic groups in terms of AFQT and 
promotion rates. As pointed out by Daula (1985) the 
explanation of this disparity cannot be seen in an aggregate 
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promotion data approach, but rather/ a duration model 
approach with a set group of individual soldiers over 
time. [Ref. ll:pp. 7-9] His paper reports that this disparity 
is a result of attrition. Specifically/ the shifting of 
subcategory promotion averages is a result of different 
retention patterns among race and ethnic groups / and not due 
to a racialy sensitive promotion system. 

A study to determine the magnitude and underlying reasons 
for the different retention patterns, and to test this 
hypothesis, would have considerable merit. 
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APPENDIX A 



CAREER MANAGEMENT FIELDS AND FREQUENCIES 



CUMULATIVE CUMULATIVE 



MOSNAME 


CMF 


FREQUENCY 


PERCENT 


FREQUENCY 


PERCENT 


Infantry 


11 


4320 


11.4 


4320 


11.4 


Cbt Engineer 


12 


1030 


2.7 


5350 


14 . 1 


Artillery 


13 


2780 


7.3 


8130 


21 . 5 


Air Defense 


16 


851 


2 . 2 


8981 


23.7 


Special Ops 


18 


244 


0.6 


9225 


24.4 


Armor 


19 


2434 


6.4 


11659 


30.8 


Hawk Missile 


23 


187 


0 . 5 


11846 


31 . 3 


Nike Missile 


27 


352 


0.9 


12198 


32.2 


Tac Radar 


28 


40 


0.1 


12238 


32 . 3 


Tac Radar 


29 


625 


1.7 


12863 


34.0 


Communication 


31 


3265 


8.6 


16128 


42.6 


Elect Warfare 


33 


30 


0 . 1 


16158 


42.7 


Tech Drafter 


51 


619 


1 . 6 


16777 


44.3 


Chem Warfare 


54 


529 


1 . 4 


17306 


45.7 


Explosive Ord 


55 


400 


1 . 1 


17706 


46.8 


Repair 


63 


3766 


9 . 9 


21472 


56.7 


Cargo Spec 


64 


1041 


2.8 


22513 


59 . 5 


A/C Repair 


67 


1090 


2 . 9 


23603 


62.4 


Admin Spec 


71 


3020 


8 . 0 


26623 


70.3 


Programmer 


74 


423 


1 . 1 


27046 


71.4 


Supply 


76 


2677 


7.1 


29723 


78 . 5 


Recruiter 


79 


106 


0.3 


29829 


78.8 


Topo Eng 


81 


65 


0 . 2 


29894 


79 . 0 


AV Spec 


84 


157 


0.4 


30051 


79.4 


Medical 


91 


2498 


6 . 6 


32549 


86 . 0 


Lab Spec 


92 


444 


1 . 2 


32993 


87.2 


Air Traffic 


93 


175 


0 . 5 


33168 


87.6 


Food SVC 


94 


919 


2.4 


34087 


90.0 


Mil Police 


95 


1674 


4.4 


35761 


94.5 


Intelligence 


96 


789 


2.1 


36550 


96.6 


Musician 


97 


176 


0.5 


36726 


97.0 


EW/SIGINT 


98 


1125 


3.0 


37851 


100.0 
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APPENDIX B 



AFQT TRANSFORMATION EQUIVALENT SCORES 



Armed Forces Qualification Test (AFQT) 
Equivalent Percentile Scores for 1944 
Mobilization Population and 1980 Youth Population 



1980 


1944 


1980 


1944 


980 


1 


34 


33 


67 


66 


1 


35 


34 


68 


67 


2 


36 


35 


69 


68 


2 


37 


35 


70 


69 


3 


38 


36 


71 


70 


4 


39 


37 


72 


71 


5 


40 


38 


73 


72 


6 


41 


38 


74 


73 


6 


42 


39 


75 


74 


8 


43 


40 


76 


75 


8 


44 


41 


77 


76 


10 


45 


42 


78 


77 


11 


46 


42 


79 


78 


12 


47 


43 


80 


79 


14 


48 


44 


81 


80 


15 
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